Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Large regular expressions

219 views
Skip to first unread message

Nathan Harmston

unread,
Mar 15, 2010, 8:21:42 AM3/15/10
to pytho...@python.org
Hi,

So I m trying to use a very large regular expression, basically I have
a list of items I want to find in text, its kind of a conjunction of
two regular expressions and a big list......not pretty. However
everytime I try to run my code I get this exception:

OverflowError: regular expression code size limit exceeded

I understand that there is a Python imposed limit on the size of the
regular expression. And although its not nice I have a machine with
12Gb of RAM just waiting to be used, is there anyway I can alter
Python to allow big regular expressions?

Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!

Many thanks,


Nathan

Stefan Behnel

unread,
Mar 15, 2010, 8:45:31 AM3/15/10
to pytho...@python.org
Nathan Harmston, 15.03.2010 13:21:

> So I m trying to use a very large regular expression, basically I have
> a list of items I want to find in text, its kind of a conjunction of
> two regular expressions and a big list......not pretty. However
> everytime I try to run my code I get this exception:
>
> OverflowError: regular expression code size limit exceeded
>
> I understand that there is a Python imposed limit on the size of the
> regular expression. And although its not nice I have a machine with
> 12Gb of RAM just waiting to be used, is there anyway I can alter
> Python to allow big regular expressions?
>
> Could anyone suggest other methods of these kind of string matching in
> Python?

If what you are trying to match is in fact a set of strings instead of a
set of regular expressions, you might find this useful:

http://pypi.python.org/pypi/acora

Stefan

Alain Ketterlin

unread,
Mar 15, 2010, 8:50:27 AM3/15/10
to
Nathan Harmston <iwanttob...@googlemail.com> writes:

[...]


> Could anyone suggest other methods of these kind of string matching in
> Python? I m trying to see if my swigged alphabet trie is faster than
> whats possible in Python!

Since you mention using a trie, I guess it's just a big alternative of
fixed strings. You may want to try using the Aho-Corasick variant. It
looks like there are several implementations (google finds at least
two). I would be surprised if any pure python solution were faster than
tries implemented in C. Don't forget to tell us your findings.

-- Alain.

MRAB

unread,
Mar 15, 2010, 11:51:25 AM3/15/10
to pytho...@python.org
Nathan Harmston wrote:
> Hi,

>
> So I m trying to use a very large regular expression, basically I have
> a list of items I want to find in text, its kind of a conjunction of
> two regular expressions and a big list......not pretty. However
> everytime I try to run my code I get this exception:
>
> OverflowError: regular expression code size limit exceeded
>
> I understand that there is a Python imposed limit on the size of the
> regular expression. And although its not nice I have a machine with
> 12Gb of RAM just waiting to be used, is there anyway I can alter
> Python to allow big regular expressions?
>
> Could anyone suggest other methods of these kind of string matching in
> Python? I m trying to see if my swigged alphabet trie is faster than
> whats possible in Python!
>
There's the regex module at http://pypi.python.org/pypi/regex. It'll
even release the GIL while matching on strings! :-)
0 new messages