RegEx/string performance benchmarks

3731 views
Skip to first unread message

Alexander Orlov

unread,
Apr 28, 2011, 9:55:01 AM4/28/11
to golang-nuts
Working with Go I've experienced some perceived slowness when
performing RegEx operations on strings. I've looked for benchmarks and
found this one http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=go&lang2=java

In some areas Go kann keep up with Java but when it comes to string
operations ("regex-dna" benchmark), Go is even much slower than Ruby
or Python. Is the status quo going to improve anytime soon? And why is
Go so terribly slow when it comes to string/RegEx operations?

Russ Cox

unread,
Apr 28, 2011, 10:13:21 AM4/28/11
to Alexander Orlov, golang-nuts
> In some areas Go kann keep up with Java but when it comes to string
> operations ("regex-dna" benchmark), Go is even much slower than Ruby
> or Python. Is the status quo going to improve anytime soon? And why is
> Go so terribly slow when it comes to string/RegEx operations?

You assume the benchmark is worth something.

First of all, Ruby and Python are using C implementations
of the regexp search, so Go is being beat by C, not by Ruby.

Second, Go is using a different algorithm for regexp matching
than the C implementations in those other languages.
The algorithm Go uses guarantees to complete in time that is
linear in the length of the input. The algorithm that Ruby/Python/etc
are using can take time exponential in the length of the input,
although on trivial cases it typically runs quite fast.
In order to guarantee the linear time bound, Go's algorithm's
best case speed a little slower than the optimistic Ruby/Python/etc
algorithm. On the other hand, there are inputs for which Go will
return quickly and Ruby/Python/etc need more time than is left
before the heat death of the universe. It's a decent tradeoff.

http://swtch.com/~rsc/regexp/regexp1.html

Russ

Alexander Orlov

unread,
Apr 28, 2011, 10:21:46 AM4/28/11
to r...@golang.org, golang-nuts
On Thu, Apr 28, 2011 at 4:13 PM, Russ Cox <r...@golang.org> wrote:
Second, Go is using a different algorithm for regexp matching
than the C implementations in those other languages.

http://swtch.com/~rsc/regexp/regexp1.html

To the point, thx!

As I saw that all other languages (not considering the same implementation/algorithm they've used) were faster in *this particular*  benchmark, I've thought it was a general Go string/regex problem.

Alex

Gustavo Niemeyer

unread,
Apr 28, 2011, 10:23:00 AM4/28/11
to golang-nuts
> algorithm.  On the other hand, there are inputs for which Go will
> return quickly and Ruby/Python/etc need more time than is left
> before the heat death of the universe.  It's a decent tradeoff.

Now that you understand that you can already participate in the "I
have killed a web service with a regex" Facebook group.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter

Isaac Gouy

unread,
Apr 28, 2011, 11:58:31 AM4/28/11
to golang-nuts


On Apr 28, 7:13 am, Russ Cox <r...@golang.org> wrote:
> > In some areas Go kann keep up with Java but when it comes to string
> > operations ("regex-dna" benchmark), Go is even much slower than Ruby
> > or Python. Is the status quo going to improve anytime soon? And why is
> > Go so terribly slow when it comes to string/RegEx operations?
>
> You assume the benchmark is worth something.
>
> First of all, Ruby and Python are using C implementations
> of the regexp search, so Go is being beat by C, not by Ruby.

I'm not sure what you mean by that -

a) do you mean those particular Ruby and Python programs use ffi to
call C library functions?

b) do you mean standard Ruby and Python libraries are themselves
written in C?

Russ Cox

unread,
Apr 28, 2011, 12:04:17 PM4/28/11
to Isaac Gouy, golang-nuts
>> First of all, Ruby and Python are using C implementations
>> of the regexp search, so Go is being beat by C, not by Ruby.
>
> I'm not sure what you mean by that -
>
> a) do you mean those particular Ruby and Python programs use ffi to
> call C library functions?
>
> b) do you mean standard Ruby and Python libraries are themselves
> written in C?

b

Gustavo Niemeyer

unread,
Apr 28, 2011, 12:22:16 PM4/28/11
to golang-nuts
> b) do you mean standard Ruby and Python libraries are themselves
> written in C?

http://svn.python.org/projects/python/trunk/Modules/_sre.c

That's part of the beauty of Go.

Isaac Gouy

unread,
Apr 28, 2011, 1:08:44 PM4/28/11
to golang-nuts
Which is to say Ruby and Python are "scripting languages"

http://www.tcl.tk/doc/scripting.html


Gustavo Niemeyer

unread,
Apr 28, 2011, 1:27:15 PM4/28/11
to Isaac Gouy, golang-nuts
> Which is to say Ruby and Python are "scripting languages"
>
> http://www.tcl.tk/doc/scripting.html

The "scripting" line is pretty blurred nowadays. A lot of the things
being said there apply to Go as well, and many of them make no sense
for Python.

Isaac Gouy

unread,
Apr 28, 2011, 1:39:38 PM4/28/11
to golang-nuts


On Apr 28, 10:27 am, Gustavo Niemeyer <gust...@niemeyer.net> wrote:
> > Which is to say Ruby and Python are "scripting languages"
>
> >http://www.tcl.tk/doc/scripting.html
>
> The "scripting" line is pretty blurred nowadays.  A lot of the things
> being said there apply to Go as well, and many of them make no sense
> for Python.

That isn't any where near specific enough for me to agree or disagree
with, but as we're rapidly heading off-topic for go-lang I'll just let
it go.

Gustavo Niemeyer

unread,
Apr 28, 2011, 1:44:50 PM4/28/11
to Isaac Gouy, golang-nuts
> That isn't any where near specific enough for me to agree or disagree

Yeah, that's exactly the idea. ;-)

Russ Cox

unread,
Apr 28, 2011, 1:48:42 PM4/28/11
to Isaac Gouy, golang-nuts
>> > b) do you mean standard Ruby and Python libraries are themselves
>> > written in C?
>>
>> b
>
> Which is to say Ruby and Python are "scripting languages"

By that definition I guess C and C++ are scripting languages too.

Isaac Gouy

unread,
Apr 28, 2011, 2:55:19 PM4/28/11
to golang-nuts
Not Go?

I guess not, because "Go was born out of frustration with existing
languages and environments for systems programming" to fill that "[n]o
major systems language has emerged in over a decade" space.

For example (as you already pointed out) Go isn't using a C
implementation for the regexp library like Ruby and Python do.

Russ Cox

unread,
Apr 28, 2011, 3:05:21 PM4/28/11
to Isaac Gouy, golang-nuts
On Thu, Apr 28, 2011 at 14:55, Isaac Gouy <igo...@yahoo.com> wrote:
> On Apr 28, 10:48 am, Russ Cox <r...@google.com> wrote:
>> >> > b) do you mean standard Ruby and Python libraries are themselves
>> >> > written in C?
>>
>> >> b
>>
>> > Which is to say Ruby and Python are "scripting languages"
>>
>> By that definition I guess C and C++ are scripting languages too.
>
> Not Go?

You said the definition was that if libraries are written in C
then it's a scripting language. Standard Go libraries such as
package regexp are written in Go, not in C.

Russ

Isaac Gouy

unread,
Apr 28, 2011, 3:29:02 PM4/28/11
to golang-nuts
I had thought you were objecting to the vagueness or dated nature of
Ousterhout's distinction between systems programming and scripting -
but now it seems you want to take the wording I used to distinguish
between FFI and batteries included standard libraries as though that
wording was intended to be a definition of scripting languages.

Florian Weimer

unread,
Apr 30, 2011, 2:37:21 PM4/30/11
to golang-nuts
* Russ Cox:

> Second, Go is using a different algorithm for regexp matching
> than the C implementations in those other languages.
> The algorithm Go uses guarantees to complete in time that is
> linear in the length of the input. The algorithm that Ruby/Python/etc
> are using can take time exponential in the length of the input,
> although on trivial cases it typically runs quite fast.

Go could run both algorithms in parallel and use the result that comes
first. (Just kidding.)

Michael Jones

unread,
Apr 30, 2011, 3:50:57 PM4/30/11
to Florian Weimer, golang-nuts
Many high-speed dedicated hardware systems do just this. (not for regular expressions, but for image matching/tracking in missile seeker heads and so on.)
--

Michael T. Jones

   Chief Technology Advocate, Google Inc.

   1600 Amphitheatre Parkway, Mountain View, California 94043

   Email: m...@google.com  Mobile: 650-335-5765  Fax: 650-649-1938

   Organizing the world's information to make it universally accessible and useful


Reply all
Reply to author
Forward
0 new messages