GoAWK: an AWK interpreter written in Go

853 views
Skip to first unread message

Ben Hoyt

unread,
Aug 24, 2018, 5:13:25 PM8/24/18
to golang-nuts
I recently wrote an AWK interpreter in Go: https://github.com/benhoyt/goawk

It's a pretty simple implementation: hand-rolled lexer, recursive-descent parser, and tree-walk interpreter. It's pretty complete according to the POSIX spec, and it passes the AWK ("one true awk") test suite as well as my own unit tests.

In some quick tests I ran, I/O speed is on a par or better than AWK but the interpreter itself is quite slow -- about 5x slower for a lot of things. I hope to add some proper benchmarks soon. I have a pretty good of why and how to fix it: variable lookup and assignment is slow, and I'm planning to fix by resolving more things at parse time.

One thing that's a bit funky about AWK is its type handling: string values can be real strings or "numeric strings" (numbers that came from user input). I'm currently passing the "value" struct (see interp/value.go) around by value. I still need to test if that's a good idea performance-wise or not.

I'd love to hear any comments, bug reports, or -- especially -- code reviews.

-Ben

Scott Pakin

unread,
Aug 27, 2018, 3:39:35 PM8/27/18
to golang-nuts
Once you have some proper benchmarks, it might be fun to compare GoAWK's performance to that of my awk package.  The package implements AWK semantics in Go so a typical program is far more verbose than it would be in AWK but integrates tightly with Go code (e.g., one can use arbitrary Go code within the body of an AWK action).  GoAWK seems a lot easier to use when that level of integration is not needed, however.

I don't know how much performance difference this makes in practice, but my value struct (also in a value.go file) lazily converts among strings, floats, and ints and caches the conversions.  I don't keep track of "the" type of a value (your typ field), just whether I have a currently valid string/float/int representation.

No need to change your lexer/parser at this stage, but I've recently grown quite fond of PEG parsers.  These are a lot like hand-rolled, recursive-descent parsers so they're relatively easy to wrap your head around and reasonably powerful but require less code/effort than actually rolling your own.  For Go, I've used pigeon for a few projects (e.g., edif2qmasm, for which a PEG parser is probably overkill).  I like pigeon, but I admit I didn't do a thorough analysis of all the available PEG parsers for Go before going with that one.

Nice work!

— Scott

Ben Hoyt

unread,
Aug 28, 2018, 9:06:22 AM8/28/18
to scot...@pakin.org, golang-nuts
Once you have some proper benchmarks, it might be fun to compare GoAWK's performance to that of my awk package.

Nice -- will do!

I don't know how much performance difference this makes in practice, but my value struct (also in a value.go file) lazily converts among strings, floats, and ints and caches the conversions.  I don't keep track of "the" type of a value (your typ field), just whether I have a currently valid string/float/int representation.

Interesting approach. I hope to try out different value representations, but from a quick profiling run I don't think that's the bottleneck right now (using maps for variables is).

No need to change your lexer/parser at this stage, but I've recently grown quite fond of PEG parsers.  These are a lot like hand-rolled, recursive-descent parsers so they're relatively easy to wrap your head around and reasonably powerful but require less code/effort than actually rolling your own.  For Go, I've used pigeon for a few projects (e.g., edif2qmasm, for which a PEG parser is probably overkill).  I like pigeon, but I admit I didn't do a thorough analysis of all the available PEG parsers for Go before going with that one.

I'll check them out.

Nice work!

Thanks!

-Ben

Ben Hoyt

unread,
Nov 17, 2018, 12:44:57 PM11/17/18
to golan...@googlegroups.com, Alan Donovan
I've "finished" my Go AWK interpreter and released v1.0.0 now. I've fixed several bugs, and also sped up the interpreter significantly, primarily by:

1) Resolving variable names to integers at parse time so we can do []value lookups instead of map[string]value lookups at runtime
2) Using normal error returns instead of panic/recover internally for interpreter errors (but oh for the proposed "check" keyword to make this less verbose!)
3) Using switch/case to switch on binary operation type instead of looking up the operation in a map of functions
4) Avoiding allocations in many instances

Here's a write-up about GoAWK for those that are interested:

https://benhoyt.com/writings/goawk/

It's pretty much a toy project, not being used for anything real, but it seems to be in a good shape. Feedback welcome!

-Ben


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/kYZp3Q1KKfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kty...@gmail.com

unread,
Nov 17, 2018, 5:49:25 PM11/17/18
to golang-nuts
That looks nice!
I wonder if it makes sense, to expose more of the interpreter to go.
E.g.: register a user function or add an action written in go.

Alan Donovan

unread,
Nov 17, 2018, 6:04:11 PM11/17/18
to Ben Hoyt, golan...@googlegroups.com
This is great, both as an "étude"---a challenge for sharpening your technique---and as an exemplary write-up of the process of building something non-trivial and making it both correct and fast. Nice work.

I'm sure you knew already, but Peter Weinberger (the W in AWK) is on the Go team at Google.



To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.

Ben Hoyt

unread,
Nov 18, 2018, 7:04:45 AM11/18/18
to Alan Donovan, golan...@googlegroups.com
Thank you. I knew Weinberger was at Google, but didn't know he was on the Go team -- cool!

To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Tong Sun

unread,
Nov 22, 2018, 11:24:22 PM11/22/18
to golang-nuts


On Tuesday, August 28, 2018 at 9:06:22 AM UTC-4, Ben Hoyt wrote:
Once you have some proper benchmarks, it might be fun to compare GoAWK's performance to that of my awk package.

Nice -- will do!
 
Please post back when you've done that. 

I'm interested to know. Thx. 

Michael Jones

unread,
Nov 23, 2018, 1:36:37 AM11/23/18
to Tong Sun, golan...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Michael T. Jones
michae...@gmail.com

Ben Hoyt

unread,
Nov 23, 2018, 3:58:37 PM11/23/18
to sunto...@gmail.com, pa...@lanl.gov, golan...@googlegroups.com

Once you have some proper benchmarks, it might be fun to compare GoAWK's performance to that of my awk package.

I'm not going to do thorough benchmarks at this point, but it looks like GoAWK is significantly faster at present. Using the example in the https://github.com/spakin/awk README, which is equivalent to this AWK script:

    BEGIN { FS = OFS = "," }
    { $3 = $1+$2; print }

On a file with 1M lines of random numbers, with the example as is (no stdout buffering) GoAWK takes about 1.1 seconds, and spakin/awk takes 36 seconds! However, most of this is due to the non-buffered writes to os.Stdout. GoAWK automatically wraps os.Stdout in a bufio.Writer (though I'd forgotten to do this at first as well). When I added the line (before s.Run):

    s.Output = bufio.NewWriterSize(os.Stdout, 64*1024)

It speeds up spakin/awk by a factor of about 10x to 3.6 seconds. So GoAWK is about 3x as fast for this simple (but not unrealistic) benchmark.

I generated the 1M line random file using this Python script (guess I should have used AWK :-):

    import random, sys
    for _ in range(int(sys.argv[1])):
      n = random.randrange(1000000)
      m = random.randrange(1000000)
      print('%d,%d' % (n, m))

So my main suggestion (for spakin/awk) would be able to wrap os.Stdout in a bufio.NewWriter (and be sure to call Flush before Run finishes). If the user wants to pass an unbuffered version, they still can, but at least the default is performant.

I also added CPU profiling to the spakin/awk script, and it looks like it's doing a bunch more garbage collection than GoAWK, as well as some regexp stuff. I suspect NewValue() is probably quite slow as it takes an interface{} and does type checking. Also, strings are converted to numbers using a regex, which is probably slower than a dedicated conversion/check function (see parseFloatPrefix in goawk/interp/value.go).

See more optimization ideas in my post at https://benhoyt.com/writings/goawk/

-Ben

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/kYZp3Q1KKfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Tong Sun

unread,
Nov 23, 2018, 9:43:22 PM11/23/18
to golang-nuts


On Saturday, November 17, 2018 at 12:44:57 PM UTC-5, Ben Hoyt wrote:

https://benhoyt.com/writings/goawk/

It's pretty much a toy project, not being used for anything real, but it seems to be in a good shape. Feedback welcome!

Hope there are future plans to extend it so that, 

- people can define user function in go and use it in goawk handling, as kty... has pointed out. 
- also enable normal go program to use awk scripts

thx

Denis Cheremisov

unread,
Nov 24, 2018, 6:43:51 AM11/24/18
to golang-nuts
>  - also enable normal go program to use awk scripts

probably this is better tool for the case

суббота, 24 ноября 2018 г., 5:43:22 UTC+3 пользователь Tong Sun написал:

Tong Sun

unread,
Nov 24, 2018, 9:57:26 AM11/24/18
to golan...@googlegroups.com
On Sat, Nov 24, 2018 at 6:44 AM Denis Cheremisov wrote:
>  - also enable normal go program to use awk scripts

probably this is better tool for the case

- please don't just post a link, but include descriptions.
- please justify your claim "better" - a specific log file parser is better than general-purpose awk scripts? In what way?
- please don't hijack a thread in promoting your own tool, especially when it is highly irreverent. 

Thank you for your cooperation

Ben Hoyt

unread,
Nov 24, 2018, 10:09:37 AM11/24/18
to sunto...@gmail.com, golan...@googlegroups.com
 
Hope there are future plans to extend it so that, 
- people can define user function in go and use it in goawk handling, as kty... has pointed out. 

I toyed with this, and enabling it simple functions like "math.Abs(float64) float64". You could add math.Abs to a new config.Funcs map and then reflection would figure out the types at runtime and call it. In AWK you'd just do "abs(-5)" and it'd return 5.

It's a bit harder with functions that take make complex params and return an error. What would the API look like on the AWK side? Currently AWK knows at parse time what kinds (scalar or array) each parameter is. For example, what about an HTTP request function that took method, url, body and returned status code, headers, and response body. I was thinking for this case you could have the concept of "complex function" which always took two parameters, input args array and output values array. So for http_request it'd look something like:

in["method"] = "GET"
in["url"] = "https://golang.org/"
err = http_request(in, out)
if (err != "") {
  print "network error fetching page:", err
  exit 1
}
if (out["status"] == 404) {
  print "page not found"
  exit 1
}
print out["body"]

It's a bit klunky but at least the API is consistent: functions always take input and output arrays and return an error string (or "").

Anyone here have thoughts on a better AWK-level API?

- also enable normal go program to use awk scripts

I'm not sure what you mean here. You can already use GoAWK in normal Go programs against any io.Reader input and io.Writer for output.

-Ben

Ben Hoyt

unread,
Nov 24, 2018, 10:11:30 AM11/24/18
to sunto...@gmail.com, golan...@googlegroups.com
probably this is better tool for the case

- please don't just post a link, but include descriptions.
- please justify your claim "better" - a specific log file parser is better than general-purpose awk scripts? In what way?
- please don't hijack a thread in promoting your own tool, especially when it is highly irreverent. 

For what it's worth, I found that tool quite interesting and relevant. Tong, I think you're overreacting a bit here.

-Ben 

Tong Sun

unread,
Nov 24, 2018, 10:13:06 AM11/24/18
to ben...@gmail.com, golan...@googlegroups.com
On Sat, Nov 24, 2018 at 10:11 AM Ben Hoyt wrote:

For what it's worth, I found that tool quite interesting and relevant. Tong, I think you're overreacting a bit here.

 Oh, I'm sorry, everyone

Scott Pakin

unread,
Nov 26, 2018, 3:20:42 PM11/26/18
to golang-nuts
On Friday, November 23, 2018 at 1:58:37 PM UTC-7, Ben Hoyt wrote:
So my main suggestion (for spakin/awk) would be able to wrap os.Stdout in a bufio.NewWriter (and be sure to call Flush before Run finishes). If the user wants to pass an unbuffered version, they still can, but at least the default is performant.

I also added CPU profiling to the spakin/awk script, and it looks like it's doing a bunch more garbage collection than GoAWK, as well as some regexp stuff. I suspect NewValue() is probably quite slow as it takes an interface{} and does type checking. Also, strings are converted to numbers using a regex, which is probably slower than a dedicated conversion/check function (see parseFloatPrefix in goawk/interp/value.go).

See more optimization ideas in my post at https://benhoyt.com/writings/goawk/

Thanks for doing the performance analysis of spakin/awk!  Once I find the time to work on that project some more, I'll certainly look into implementing some of your suggestions.

— Scott

Ben Hoyt

unread,
Dec 1, 2018, 2:17:26 PM12/1/18
to kty...@gmail.com, golan...@googlegroups.com

I wonder if it makes sense, to expose more of the interpreter to go.
E.g.: register a user function or add an action written in go.

I had thought about this before, but your comment made me want to try it. It wasn't actually that hard, so I've added backwards-compatible support for this now (GoAWK v1.1.0). It works kind of like Funcs() in text/template or html/template.

You can pass in your Go functions as a map[string]interface{}, and as long as they take and return bools, numbers, or strings (or []byte) it'll use reflection to do all the hard work for you. It also supports variadic functions. Functions defined in AWK with "function foo" take precedence over Go functions passed in via Funcs (I did this so that passing in different Funcs doesn't change the behavior of certain AWK scripts). See more docs under the "Funcs" field here: https://godoc.org/github.com/benhoyt/goawk/interp#Config

It's kind of a solution in search of a problem right now, but you could use this to call something simple like strings.Repeat, or something complex like doing an HTTP request. It was also good to learn more about how to use the "reflect" package.

Again, feedback welcome.

-Ben
Reply all
Reply to author
Forward
0 new messages