RegEx/string performance benchmarks

4,116 views
Skip to first unread message

Alexander Orlov

unread,
Apr 28, 2011, 9:55:01 AM4/28/11
to golang-nuts
Working with Go I've experienced some perceived slowness when
performing RegEx operations on strings. I've looked for benchmarks and
found this one http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=go&lang2=java

In some areas Go kann keep up with Java but when it comes to string
operations ("regex-dna" benchmark), Go is even much slower than Ruby
or Python. Is the status quo going to improve anytime soon? And why is
Go so terribly slow when it comes to string/RegEx operations?

Russ Cox

unread,
Apr 28, 2011, 10:13:21 AM4/28/11
to Alexander Orlov, golang-nuts
> In some areas Go kann keep up with Java but when it comes to string
> operations ("regex-dna" benchmark), Go is even much slower than Ruby
> or Python. Is the status quo going to improve anytime soon? And why is
> Go so terribly slow when it comes to string/RegEx operations?

You assume the benchmark is worth something.

First of all, Ruby and Python are using C implementations
of the regexp search, so Go is being beat by C, not by Ruby.

Second, Go is using a different algorithm for regexp matching
than the C implementations in those other languages.
The algorithm Go uses guarantees to complete in time that is
linear in the length of the input. The algorithm that Ruby/Python/etc
are using can take time exponential in the length of the input,
although on trivial cases it typically runs quite fast.
In order to guarantee the linear time bound, Go's algorithm's
best case speed a little slower than the optimistic Ruby/Python/etc
algorithm. On the other hand, there are inputs for which Go will
return quickly and Ruby/Python/etc need more time than is left
before the heat death of the universe. It's a decent tradeoff.

http://swtch.com/~rsc/regexp/regexp1.html

Russ

Alexander Orlov

unread,
Apr 28, 2011, 10:21:46 AM4/28/11
to r...@golang.org, golang-nuts
On Thu, Apr 28, 2011 at 4:13 PM, Russ Cox <r...@golang.org> wrote:
Second, Go is using a different algorithm for regexp matching
than the C implementations in those other languages.

http://swtch.com/~rsc/regexp/regexp1.html

To the point, thx!

As I saw that all other languages (not considering the same implementation/algorithm they've used) were faster in *this particular*  benchmark, I've thought it was a general Go string/regex problem.

Alex

Gustavo Niemeyer

unread,
Apr 28, 2011, 10:23:00 AM4/28/11
to golang-nuts
> algorithm.  On the other hand, there are inputs for which Go will
> return quickly and Ruby/Python/etc need more time than is left
> before the heat death of the universe.  It's a decent tradeoff.

Now that you understand that you can already participate in the "I
have killed a web service with a regex" Facebook group.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter

Isaac Gouy

unread,
Apr 28, 2011, 11:58:31 AM4/28/11
to golang-nuts


On Apr 28, 7:13 am, Russ Cox <r...@golang.org> wrote:
> > In some areas Go kann keep up with Java but when it comes to string
> > operations ("regex-dna" benchmark), Go is even much slower than Ruby
> > or Python. Is the status quo going to improve anytime soon? And why is
> > Go so terribly slow when it comes to string/RegEx operations?
>
> You assume the benchmark is worth something.
>
> First of all, Ruby and Python are using C implementations
> of the regexp search, so Go is being beat by C, not by Ruby.

I'm not sure what you mean by that -

a) do you mean those particular Ruby and Python programs use ffi to
call C library functions?

b) do you mean standard Ruby and Python libraries are themselves
written in C?

Russ Cox

unread,
Apr 28, 2011, 12:04:17 PM4/28/11
to Isaac Gouy, golang-nuts
>> First of all, Ruby and Python are using C implementations
>> of the regexp search, so Go is being beat by C, not by Ruby.
>
> I'm not sure what you mean by that -
>
> a) do you mean those particular Ruby and Python programs use ffi to
> call C library functions?
>
> b) do you mean standard Ruby and Python libraries are themselves
> written in C?

b

Gustavo Niemeyer

unread,
Apr 28, 2011, 12:22:16 PM4/28/11
to golang-nuts
> b) do you mean standard Ruby and Python libraries are themselves
> written in C?

http://svn.python.org/projects/python/trunk/Modules/_sre.c

That's part of the beauty of Go.

Isaac Gouy

unread,
Apr 28, 2011, 1:08:44 PM4/28/11
to golang-nuts
Which is to say Ruby and Python are "scripting languages"

http://www.tcl.tk/doc/scripting.html


Gustavo Niemeyer

unread,
Apr 28, 2011, 1:27:15 PM4/28/11
to Isaac Gouy, golang-nuts
> Which is to say Ruby and Python are "scripting languages"
>
> http://www.tcl.tk/doc/scripting.html

The "scripting" line is pretty blurred nowadays. A lot of the things
being said there apply to Go as well, and many of them make no sense
for Python.

Isaac Gouy

unread,
Apr 28, 2011, 1:39:38 PM4/28/11
to golang-nuts


On Apr 28, 10:27 am, Gustavo Niemeyer <gust...@niemeyer.net> wrote:
> > Which is to say Ruby and Python are "scripting languages"
>
> >http://www.tcl.tk/doc/scripting.html
>
> The "scripting" line is pretty blurred nowadays.  A lot of the things
> being said there apply to Go as well, and many of them make no sense
> for Python.

That isn't any where near specific enough for me to agree or disagree
with, but as we're rapidly heading off-topic for go-lang I'll just let
it go.

Gustavo Niemeyer

unread,
Apr 28, 2011, 1:44:50 PM4/28/11
to Isaac Gouy, golang-nuts
> That isn't any where near specific enough for me to agree or disagree

Yeah, that's exactly the idea. ;-)

Russ Cox

unread,
Apr 28, 2011, 1:48:42 PM4/28/11
to Isaac Gouy, golang-nuts
>> > b) do you mean standard Ruby and Python libraries are themselves
>> > written in C?
>>
>> b
>
> Which is to say Ruby and Python are "scripting languages"

By that definition I guess C and C++ are scripting languages too.

Isaac Gouy

unread,
Apr 28, 2011, 2:55:19 PM4/28/11
to golang-nuts
Not Go?

I guess not, because "Go was born out of frustration with existing
languages and environments for systems programming" to fill that "[n]o
major systems language has emerged in over a decade" space.

For example (as you already pointed out) Go isn't using a C
implementation for the regexp library like Ruby and Python do.

Russ Cox

unread,
Apr 28, 2011, 3:05:21 PM4/28/11
to Isaac Gouy, golang-nuts
On Thu, Apr 28, 2011 at 14:55, Isaac Gouy <igo...@yahoo.com> wrote:
> On Apr 28, 10:48 am, Russ Cox <r...@google.com> wrote:
>> >> > b) do you mean standard Ruby and Python libraries are themselves
>> >> > written in C?
>>
>> >> b
>>
>> > Which is to say Ruby and Python are "scripting languages"
>>
>> By that definition I guess C and C++ are scripting languages too.
>
> Not Go?

You said the definition was that if libraries are written in C
then it's a scripting language. Standard Go libraries such as
package regexp are written in Go, not in C.

Russ

Isaac Gouy

unread,
Apr 28, 2011, 3:29:02 PM4/28/11
to golang-nuts
I had thought you were objecting to the vagueness or dated nature of
Ousterhout's distinction between systems programming and scripting -
but now it seems you want to take the wording I used to distinguish
between FFI and batteries included standard libraries as though that
wording was intended to be a definition of scripting languages.

Florian Weimer

unread,
Apr 30, 2011, 2:37:21 PM4/30/11
to golang-nuts
* Russ Cox:

> Second, Go is using a different algorithm for regexp matching
> than the C implementations in those other languages.
> The algorithm Go uses guarantees to complete in time that is
> linear in the length of the input. The algorithm that Ruby/Python/etc
> are using can take time exponential in the length of the input,
> although on trivial cases it typically runs quite fast.

Go could run both algorithms in parallel and use the result that comes
first. (Just kidding.)

Michael Jones

unread,
Apr 30, 2011, 3:50:57 PM4/30/11
to Florian Weimer, golang-nuts
Many high-speed dedicated hardware systems do just this. (not for regular expressions, but for image matching/tracking in missile seeker heads and so on.)
--

Michael T. Jones

   Chief Technology Advocate, Google Inc.

   1600 Amphitheatre Parkway, Mountain View, California 94043

   Email: m...@google.com  Mobile: 650-335-5765  Fax: 650-649-1938

   Organizing the world's information to make it universally accessible and useful


Message has been deleted

a.ko...@gmail.com

unread,
Nov 30, 2025, 1:13:04 AM (yesterday) Nov 30
to golang-nuts
Hi all,

I know this thread is 10+ years old, but I wanted to follow up since
the regexp performance discussion is still highly relevant today.

TL;DR: The situation has improved slightly over the years, but the
fundamental performance characteristics haven't changed dramatically.
So I built coregex - an alternative regex engine for Go that addresses
the performance issues discussed here.


What's Changed in Go stdlib (2013-2025)
========================================

The good:
  - Bug fixes and stability improvements
  - Better Unicode handling
  - Minor optimizations here and there

The unchanged:
  - Still uses Thompson's NFA exclusively
  - No SIMD optimizations
  - No prefilter strategies
  - Same single-engine architecture

Go's regexp prioritizes correctness and simplicity over raw performance.
That's a valid design choice - it guarantees O(n) time complexity and
prevents ReDoS attacks. But for regex-heavy workloads, the performance
gap vs other languages remains significant.


The Performance Gap Today (2025)
=================================

Benchmarking against Rust's regex crate on patterns like .*error.*connection.*:

  - Go stdlib: 12.6ms (250KB input)
  - Rust regex: ~20µs (same input)
  - Gap: ~600x slower

This isn't a criticism of Go - it's a different set of trade-offs.
But it shows the problem hasn't gone away.


What I Built: coregex
=====================

After hitting regex bottlenecks in production, I spent 6 months building
coregex - a drop-in replacement for Go's regexp.

GitHub: https://github.com/coregx/coregex

Architecture:
  - Multi-engine strategy selection (DFA/NFA/specialized engines)
  - SIMD-accelerated prefilters (AVX2 assembly)
  - Bidirectional search for patterns like .*keyword.*
  - Zero allocations in hot paths

Performance (vs stdlib):
  - 3-3000x faster depending on pattern
  - Maintains O(n) guarantees (no backtracking)
  - Drop-in API compatibility

Real benchmarks:

  Pattern              Input   stdlib    coregex   Speedup
  -------------------------------------------------------
  .*\.txt$            1MB     27ms      21µs      1,314x
  .*error.*           250KB   12.6ms    4µs       3,154x
  (?i)error           32KB    1.23ms    4.7µs     263x
  \w+@\w+\.\w+        1KB     688ns     196ns     3.5x

Status: v0.8.0 released, MIT licensed, 88% test coverage


Could This Go Into stdlib?
===========================

That's the interesting question. I've been thinking about this from
several angles:

Challenges:
  1. Complexity - Multi-engine architecture is significantly more
     complex than current implementation
  2. Maintenance burden - SIMD assembly needs platform-specific
     variants (AVX2, NEON, etc.)
  3. Binary size - Multiple engines increase compiled binary size
  4. API stability - stdlib changes need extreme care

Opportunities:
  1. Incremental adoption - Could start with just SIMD primitives
     (internal/bytealg improvements)
  2. Opt-in optimizations - Keep current implementation as default,
     offer regexp/fast package
  3. Strategy selection - Add smart path selection without breaking
     existing code
  4. Knowledge transfer - Techniques from coregex could inform stdlib
     improvements


What I'm Proposing
==================

Rather than a direct "merge coregex into stdlib" proposal, I'm suggesting:

  1. Short term: Community uses coregex for performance-critical workloads
  2. Medium term: Discuss which techniques could benefit stdlib
     (SIMD byte search, prefilters)
  3. Long term: Potential collaboration on stdlib improvements
     (if there's interest)

I'd be happy to:
  - Help with stdlib patches for incremental improvements
  - Share implementation learnings and benchmarks
  - Discuss compatibility considerations


For Those Interested
====================

Try it:
  go get github.com/coregx/cor...@v0.8.0

Read more:
  - Dev.to article:
    https://dev.to/kolkov/gos-regexp-is-slow-so-i-built-my-own-3000x-faster-3i6h
  - GitHub repo:
    https://github.com/coregx/coregex
  - v0.8.0 release:
    https://github.com/coregx/coregex/releases/tag/v0.8.0

Feedback welcome on:
  - API compatibility issues
  - Performance on your specific patterns
  - Ideas for stdlib integration


The Bottom Line
===============

The regexp performance discussion from 10+ years ago was valid then and
remains valid now. The good news: we have options today. The better news:
maybe some of these ideas will make their way into stdlib eventually.

In the meantime, coregex is production-ready and MIT-licensed. Use it if
it helps.

Cheers,
Andrey Kolkov
GitHub: https://github.com/kolkov
CoreGX (Production Go Libraries): https://github.com/coregx

Robert Engels

unread,
Nov 30, 2025, 2:00:27 AM (yesterday) Nov 30
to a.ko...@gmail.com, golang-nuts
Very cool. 

On Nov 30, 2025, at 12:19 AM, a.ko...@gmail.com <a.ko...@gmail.com> wrote:

Hi all,
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/ba9bb686-3db1-4d5c-b92a-d5cdd9f6814cn%40googlegroups.com.

a.ko...@gmail.com

unread,
Nov 30, 2025, 7:19:54 AM (yesterday) Nov 30
to golang-nuts

Thanks! )

Jan Mercl

unread,
Nov 30, 2025, 8:12:23 AM (yesterday) Nov 30
to a.ko...@gmail.com, golang-nuts
On Sun, Nov 30, 2025 at 7:11 AM a.ko...@gmail.com <a.ko...@gmail.com> wrote:

> TL;DR: The situation has improved slightly over the years, but the
> fundamental performance characteristics haven't changed dramatically.
> So I built coregex - an alternative regex engine for Go that addresses
> the performance issues discussed here.

Related alt.: https://gitlab.com/cznic/regexp/-/blob/3c832865fb05b33f13ad546144a456e6e821bfb7/README.md

Brian Candler

unread,
9:03 AM (12 hours ago) 9:03 AM
to golang-nuts
The Performance Gap Today (2025)
=================================

Benchmarking against Rust's regex crate on patterns like .*error.*connection.*:

  - Go stdlib: 12.6ms (250KB input)
  - Rust regex: ~20µs (same input)
  - Gap: ~600x slower

Can you show the actual code for these benchmarks? Is it possible that the Go example is including the time compiling the NFA? If that's the issue, then it can be precompiled and then used when required.
 
As for rust, it's impressive just to be able to read 250KB from RAM in 20µs (that's about 12GB/sec)

Reply all
Reply to author
Forward
0 new messages