Python module?

49 views
Skip to first unread message

~flow

unread,
Mar 13, 2010, 12:28:12 PM3/13/10
to re2-dev
i was a little beflummoxed when i learned with ``sudo find / -name
'*re2*'`` that ``make install" seemingly had done little more than
adding ``/usr/local/include/re2/re2.h`` to my system. there seemed to
be some ```*.a`` file in addition, but then was is it with this
``*.a`` extension?

i would like to use re2 from Python (preferrably Python 3.1) and was
excited to see files like ``make_unicode_groups.py`` in the distro.
those however were not deployed on my machine.

how can i use re2 from Python?

David Reiss

unread,
Mar 14, 2010, 8:27:48 PM3/14/10
to re2-dev
I wrote a simple Python wrapper to do some benchmarking of RE2 within
one of my programs. I'm cleaning it up now and I'll release it
shortly.

David Reiss

unread,
Mar 16, 2010, 3:00:18 PM3/16/10
to re2-dev
Here it is: http://github.com/facebook/pyre2

If you want to incorporate it into the main distribution, go ahead.
If not, that's fine too.

Alex Willmer

unread,
Mar 16, 2010, 6:32:24 PM3/16/10
to re2-dev
On Mar 16, 7:00 pm, David Reiss <dre...@gmail.com> wrote:
> Here it is:http://github.com/facebook/pyre2
>
> If you want to incorporate it into the main distribution, go ahead.
> If not, that's fine too.

The following changes to Makefile allowed me to get as far as
importing this module


diff -r deb45325aab9 Makefile
--- a/Makefile Fri Mar 12 09:52:38 2010 -0800
+++ b/Makefile Tue Mar 16 22:22:33 2010 +0000
@@ -2,7 +2,7 @@
# Use of this source code is governed by a BSD-style
# license that can be found in the LICENSE file.

-all: obj/libre2.a
+all: obj/libre2.so obj/libre2.a

# to build against PCRE for testing or benchmarking,
# uncomment the next two lines
@@ -10,11 +10,14 @@
# LDPCRE=-L/usr/local/lib -lpcre

CC=g++
-CFLAGS=-c -Wall -Wno-sign-compare -O3 -g -I. $(CCPCRE)
+CFLAGS=-c -fPIC -Wall -Wno-sign-compare -O3 -g -I. $(CCPCRE)
AR=ar
ARFLAGS=rsc
NM=nm
NMFLAGS=-p
+LD=gcc
+LDFLAGS=-shared -o
+PREFIX=/usr/local

HFILES=\
util/arena.h\
@@ -114,6 +117,9 @@
@mkdir -p obj
$(AR) $(ARFLAGS) obj/libre2.a $(OFILES)

+obj/libre2.so: $(OFILES)
+ $(LD) $(LDFLAGS) obj/libre2.so $(OFILES)
+
obj/test/%: obj/libre2.a obj/re2/testing/%.o $(TESTOFILES) obj/util/
test.o
@mkdir -p obj/test
$(CC) -o $@ obj/re2/testing/$*.o $(TESTOFILES) obj/util/test.o obj/
libre2.a -lpthread $(LDPCRE)
@@ -132,17 +138,20 @@

benchmark: obj/test/regexp_benchmark

-install: obj/libre2.a
- mkdir -p /usr/local/include/re2
- install -m 444 re2/re2.h /usr/local/include/re2/re2.h
- install -m 444 re2/stringpiece.h /usr/local/include/re2/
stringpiece.h
- install -m 444 re2/variadic_function.h /usr/local/include/re2/
variadic_function.h
- install -m 555 obj/libre2.a /usr/local/lib/libre2.a
+install: obj/libre2.a obj/libre2.so
+ mkdir -p $(PREFIX)/include/re2
+ mkdir -p $(PREFIX)/lib
+ install -m 444 re2/re2.h $(PREFIX)/include/re2/re2.h
+ install -m 444 re2/stringpiece.h $(PREFIX)/include/re2/stringpiece.h
+ install -m 444 re2/variadic_function.h $(PREFIX)/include/re2/
variadic_function.h
+ install -m 555 obj/libre2.a $(PREFIX)/lib/libre2.a
+ install -m 555 obj/libre2.so $(PREFIX)/lib/libre2.so
+ ldconfig

testinstall:
@mkdir -p obj
cp testinstall.cc obj
- (cd obj && g++ -I/usr/local/include testinstall.cc -lre2 -lpthread -
o testinstall)
+ (cd obj && g++ -I$(PREFIX)/include testinstall.cc -lre2 -lpthread -o
testinstall)
obj/testinstall

benchlog: obj/test/regexp_benchmark

David Reiss

unread,
Mar 16, 2010, 7:01:54 PM3/16/10
to re2-dev
Oh, right. I totally forgot that I had to build with fPIC and update
my search path. I'll update the README.

~flow

unread,
Mar 18, 2010, 8:50:20 AM3/18/10
to re2-dev
that’s very great news! i am delighted! thx!

Andi Albrecht

unread,
Mar 19, 2010, 1:24:06 AM3/19/10
to re2-dev
Hi David!

Thanks a lot for this bindings! I've played with it a bit and there's a nice performance boost compared to Python's builtin re module for some operations.

Just one nit I came acrosss: Sometimes there's a segfault with invalid regular expressions. I'm not quite sure how to reproduce it as it doesn't fail always. FWIW, here's an example session that produced a segfault:

>>> import re2
>>> p = re2.compile("(foo")
re2/re2.cc:153: Error parsing '(foo': missing ): (foo
...
error: (6, 'missing ): (foo')
>>> p = re2.compile("(foo|")
re2/re2.cc:153: Error parsing '(foo|': missing ): (foo|
Segmentation fault

As mentioned above, this doesn't always fail... I had more luck when using an invalid escape sequence like

>>> import re2
p = re2.compile(r'(foo\1')
re2/re2.cc:153: Error parsing '(foo\1': invalid escape sequence: \1
Segmentation fault

Again thanks and best regards,

Andi

David Reiss

unread,
Mar 19, 2010, 1:25:43 PM3/19/10
to re2-dev
Thanks for the report. The problem should be fixed in the latest
master. Not surprisingly, it was caused by my misunderstanding of the
Python C API.

Russ Cox

unread,
Mar 19, 2010, 7:43:08 PM3/19/10
to David Reiss, re2-dev
> If you want to incorporate it into the main distribution, go ahead.
> If not, that's fine too.

I plan to, thanks for putting it together.

You might also want to pass the RE2::Quiet option
to the constructor, to silence the log prints like

re2/re2.cc:153: Error parsing '(foo': missing ): (foo

Russ

David Reiss

unread,
Mar 19, 2010, 9:49:24 PM3/19/10
to re2-dev
> You might also want to pass the RE2::Quiet option
> to the constructor, to silence the log prints like
>
> re2/re2.cc:153: Error parsing '(foo': missing ): (foo

Good call. Done in latest master. It might be worth documenting that
the constructor doesn't keep a reference to the options. I had to
check the source to confirm that it was okay to use a temporary.

Reply all
Reply to author
Forward
0 new messages