So I m trying to use a very large regular expression, basically I have
a list of items I want to find in text, its kind of a conjunction of
two regular expressions and a big list......not pretty. However
everytime I try to run my code I get this exception:
OverflowError: regular expression code size limit exceeded
I understand that there is a Python imposed limit on the size of the
regular expression. And although its not nice I have a machine with
12Gb of RAM just waiting to be used, is there anyway I can alter
Python to allow big regular expressions?
Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!
Many thanks,
Nathan
If what you are trying to match is in fact a set of strings instead of a
set of regular expressions, you might find this useful:
http://pypi.python.org/pypi/acora
Stefan
[...]
> Could anyone suggest other methods of these kind of string matching in
> Python? I m trying to see if my swigged alphabet trie is faster than
> whats possible in Python!
Since you mention using a trie, I guess it's just a big alternative of
fixed strings. You may want to try using the Aho-Corasick variant. It
looks like there are several implementations (google finds at least
two). I would be surprised if any pure python solution were faster than
tries implemented in C. Don't forget to tell us your findings.
-- Alain.