[ANN] Lua (including non-greedy) pattern matching in Go!

301 views
Skip to first unread message

Jim Whitehead II

unread,
Jun 27, 2010, 3:52:28 PM6/27/10
to golang-nuts
I've just finished the base implementation of Lua pattern matching in
Go [1], supporting string.match and string.find. The package
implements a subset of Lua patterns in Go. It is not an idiomatic
port, but rather uses as close to a line-by-line translation of the
original C source. This means this package incurs quite a bit of
overhead by implementing a string pointer type. I have still found the
speed to be acceptable for minor pattern matching needs.

On a worst case pattern match with backtracking:

string.match(("a"):rep(1e7) .. "b", ".-b")'

Lua performs 54 times better than the equivalent program using this
package. This is almost certainly to do with the fact that I
implemented a stringPointer type that allowed for a line-by-line port.
It's important to note that this is not an idiomatic port, and
therefore is not designed to be fast, although I would like to get
there eventually.

I doubt this will be of major use to anyone, but please let me know if
you decide to use it and have any feedback or suggestions.

[1]: http://github.com/jnwhiteh/go-luapatterns

Jessta

unread,
Jun 28, 2010, 4:38:28 AM6/28/10
to Jim Whitehead II, golang-nuts
On Mon, Jun 28, 2010 at 5:52 AM, Jim Whitehead II <jnwh...@gmail.com> wrote:
>This means this package incurs quite a bit of
> overhead by implementing a string pointer type.

What does this string pointer type do that a byte slice doesn't?
From a brief look it seems to do exactly the same thing.

- jessta
--
=====================
http://jessta.id.au

Jim Whitehead II

unread,
Jun 28, 2010, 4:47:49 AM6/28/10
to Jessta, golang-nuts
On Mon, Jun 28, 2010 at 9:38 AM, Jessta <jes...@gmail.com> wrote:
> On Mon, Jun 28, 2010 at 5:52 AM, Jim Whitehead II <jnwh...@gmail.com> wrote:
>>This means this package incurs quite a bit of
>> overhead by implementing a string pointer type.
>
> What does this string pointer type do that a byte slice doesn't?
> From a brief look it seems to do exactly the same thing.

Of course they accomplish the same task, but a line-by-line port of
code that is overly using pointer arithmetic is more difficult when
you need to convert everything to using slices. That and once you've
moved forward in a slice you can't go back, i.e. there's no way to
p--.

This was merely a first attempt to better understand the Lua pattern
matching code, as i said, a more idiomatic solution will likely follow
if I find myself using this quite a lot.

- Jim

ptolomy23

unread,
Jun 28, 2010, 11:33:26 AM6/28/10
to Jim Whitehead II, golang-nuts
Nice; thanks for sharing.
I had the same inclination a few months back and made a naive Go implementation of Lua patterns. No doubt mine is less complete than yours; I didn't really use the Lua implementation as a reference. I also haven't done any benchmarks.
It also isn't particularly unicode friendly, but I think with a minor change to charSource.Advance (and probably charSet), it could be made so.
Maybe you'll find it interesting:

Jim Whitehead

unread,
Aug 18, 2010, 4:42:33 AM8/18/10
to golang-nuts
The latest version of go-luapatterns is not available on GitHub [1].
In this version I have converted the pattern matching code to use byte
slices rather than using a contrived string pointer type. All of the
tests I had previously written are working properly although it's
possible there are corner cases I am not testing at the moment.

The biggest advantage of this update is speed, it is significantly
faster than the previous version due to the lack of type overhead. It
is still about three time slower than the stock Lua interpreter,
something which is a bit disappointing. I'm sure there are things I am
still not doing in the most efficient way possible, but this is the
first pass of an idiomatic version of this code.

Please let me know if you have any feedback or suggestions.

[1]: http://github.com/jnwhiteh/go-luapatterns

Jim Whitehead

unread,
Aug 18, 2010, 6:08:57 AM8/18/10
to golang-nuts
> On Aug 18, 9:42 am, Jim Whitehead <jnwhi...@gmail.com> wrote:
> The latest version of go-luapatternsis not available on GitHub [1].
luapatterns.Replace(previousPost, "is not available", "is now
available")

- Jim

Archos

unread,
Aug 18, 2010, 8:24:09 AM8/18/10
to golang-nuts
Thanks for build it, but how is it licensed?

* As suggestion, could be used a shorter name. The package could be
called 'pattern' instead of 'luapatterns'. And it would be better if
all Go code were into a same sub-directory, called i.e. pattern

Then, it could be installed via goinstall as:
goinstall github.com/jnwhiteh/go-luapattern/pattern

to be imported as:
"github.com/jnwhiteh/go-luapattern/pattern"

and finally, to use as:
pattern.Match()

Archos

unread,
Aug 18, 2010, 8:30:45 AM8/18/10
to golang-nuts
Another thing. You should remove the directory 'lua-5.1.4', because
'goinstall' installs all from your base directory, in that case from
"github.com/jnwhiteh/go-luapattern".

Jim Whitehead

unread,
Aug 19, 2010, 4:21:38 AM8/19/10
to golang-nuts
Thanks for the suggestions. When I'm off holiday I'll clarify the
license and make these changes. I'll probably go with MIT, same as
Lua, unless the Go license should be something else.

Jim Whitehead

unread,
Aug 19, 2010, 6:28:20 AM8/19/10
to golang-nuts
On Aug 18, 1:24 pm, Archos <raul....@sent.com> wrote:
> Thanks for build it, but how is it licensed?

MIT license, and the source has been update accordingly

> * As suggestion, could be used a shorter name. The package could be
> called 'pattern' instead of 'luapatterns'. And it would be better if
> all Go code were into a same sub-directory, called i.e. pattern

You can do this yourself, there's no reason for me to make a purposely
short name when you can do it at import time. Give it whatever name
you'd like. Also, changing it to a subdirectory pattern does not (for
this particular package it seems) change the import name.

> Then, it could be installed via goinstall as:
>   goinstall github.com/jnwhiteh/go-luapattern/pattern

This is indeed how it has been reorganised, but not for these reasons.

> to be imported as:
>   "github.com/jnwhiteh/go-luapattern/pattern"
>
> and finally, to use as:
>   pattern.Match()

You could always have done this:

import pattern "github.com/jnwhiteh/go-luapattern"

Give it any name you'd like. The source has been updated so lua-5.1.4
isn't installed and the go source is kept separate.

- Jim

Jim Whitehead

unread,
Aug 19, 2010, 7:12:52 AM8/19/10
to golang-nuts
I've just done some very basic benchmarks against the regexp package,
which is of course still under development, and the performance of
luapatterns seems to be quite good. For the following example and the
equivalent regexp:

test := strings.Repeat("a", 1e8) + "b"
succ, _ := pattern.Match(test, "^.+b")

The regexp package completes the match in 47s on my machine, while the
Lua pattern version takes 2.9s. Testing a byte-native version of the
same gives slightly faster results with the same spread (1.39s and
45s). Although this single test isn't meant to be a representative
sample of all patterns and regular expressions, it should give us a
good impression of performance.

In the future I would like to support the frontier pattern, but I am
having some issues with it at the moment.

- Jim
Reply all
Reply to author
Forward
0 new messages