Perl SPVM::Regex module is released that is binding Google RE2.

18 views
Skip to first unread message

Yuki Kimoto

unread,
Jul 19, 2022, 6:49:10 PM7/19/22
to re2-dev
Perl SPVM::Regex module is released that is binding Google RE2.

SPVM::Regex - Regular Expression - metacpan.org


Is it possible for the RE2 community to see it?

Paul Wankadia

unread,
Jul 20, 2022, 2:18:04 AM7/20/22
to Yuki Kimoto, re2-dev
It will be seen by anyone who reads this mailing list. :)

In terms of Perl bindings, re::engine::RE2 already existed. OTOH, SPVM::Regex seems to be specific to SPVM, which appears to be an experimental project; at this stage, I presume that any SPVM users would know (or be able to discover) how to match regular expressions.

Yuki Kimoto

unread,
Jul 20, 2022, 2:35:46 AM7/20/22
to Paul Wankadia, re2-dev
> In terms of Perl bindings, re::engine::RE2 already existed.

Yes. However SPVM::Regex is different from re::engine::RE2 in the way of the implementation and the purpose.

>  SPVM::Regex seems to be specific to SPVM, which appears to be an experimental project.

Partially yes. However, the SPVM module is used as a Perl module. The following is pure Perl code.

  use SPVM 'Regex';
  
  my $re = SPVM::Regex->new("ab+");
  my $match = $re->match("abbb");

So SPVM::Regex is not specific to *only* SPVM.

SPVM is experimental status. This is right.

>  I presume that any SPVM users would know (or be able to discover) how to match regular expressions.

Oh, Sorry, What I wanted to see was the implementation of C++ code of SPVM::Regex, because I'm new to C++.

Paul Wankadia

unread,
Jul 20, 2022, 2:45:56 AM7/20/22
to Yuki Kimoto, re2-dev
On Wed, Jul 20, 2022 at 4:35 PM Yuki Kimoto <kimot...@gmail.com> wrote:

>  I presume that any SPVM users would know (or be able to discover) how to match regular expressions.

Oh, Sorry, What I wanted to see was the implementation of C++ code of SPVM::Regex, because I'm new to C++.

Oh, are you requesting a code review? Sure, I can take a look tonight and follow up with comments. https://github.com/yuki-kimoto/SPVM-Regex/blob/master/lib/SPVM/Regex.cpp is all of the C++ code, yeah?

Yuki Kimoto

unread,
Jul 20, 2022, 3:02:25 AM7/20/22
to Paul Wankadia, re2-dev
Yes, thank you very much.  That is all C++ code.

Especially, 

1. Is the way to get captures correct (RE2::PartialMatchN)?


2. Is C++ allocated memory or the object on the call stack is freed correctly?

Paul Wankadia

unread,
Jul 20, 2022, 9:58:54 AM7/20/22
to Yuki Kimoto, re2-dev
Okay, here are the three big issues that caught my eye:

https://github.com/yuki-kimoto/SPVM-Regex/blob/master/lib/SPVM/Regex.cpp#L30
pattern_length is unused. I suggest using it to construct an re2::StringPiece and passing that as the pattern to compile. That would save an unnecessary strlen(3) call and also be NUL-safe.

re2 is leaked on error. I suggest using std::unique_ptr<RE2> to own the pointer and passing ownership to env by calling .release().

Doing this will cause subtle correctness problems because RE2 can't establish context properly. I suggest calling RE2::Match() instead. That explicitly supports start/end positions, automatically gives you $0 and otherwise makes it easier to use capturing groups. (However, note that captures_length would need to be incremented by one to account for $0.)

Yuki Kimoto

unread,
Jul 20, 2022, 8:45:11 PM7/20/22
to Paul Wankadia, re2-dev
Thanks, I'll check it in a few days. 

2022年7月20日(水) 22:58 Paul Wankadia <jun...@google.com>:

Yuki Kimoto

unread,
Jul 21, 2022, 2:25:06 AM7/21/22
to Paul Wankadia, re2-dev
Okay, here are the three big issues that caught my eye:

https://github.com/yuki-kimoto/SPVM-Regex/blob/master/lib/SPVM/Regex.cpp#L30
pattern_length is unused. I suggest using it to construct an re2::StringPiece and passing that as the pattern to compile. That would save an unnecessary strlen(3) call and also be NUL-safe.



 
re2 is leaked on error. I suggest using std::unique_ptr<RE2> to own the pointer and passing ownership to env by calling .release().


I delete RE2 object in the destructor(DESTROY method, a SPVM feature). This didn't work well.

 
Doing this will cause subtle correctness problems because RE2 can't establish context properly. I suggest calling RE2::Match() instead. That explicitly supports start/end positions, automatically gives you $0 and otherwise makes it easier to use capturing groups. (However, note that captures_length would need to be incremented by one to account for $0.)


I use RE2::Match().


Because I can't write the code submatch is freed automatically using unique/shared pointer, I write the code at the end of the function.


 

Paul Wankadia

unread,
Jul 21, 2022, 3:36:03 AM7/21/22
to Yuki Kimoto, re2-dev
On Thu, Jul 21, 2022 at 4:25 PM Yuki Kimoto <kimot...@gmail.com> wrote:

I delete RE2 object in the destructor(DESTROY method, a SPVM feature). This didn't work well.

If SPVM__Regex__compile() returns here or this call to new_pointer_by_name() returns an error without having stored the pointer, then the RE2 object will be leaked.

Because I can't write the code submatch is freed automatically using unique/shared pointer, I write the code at the end of the function.

I suggest using std::vector<re2::StringPiece> for submatch. That should look quite similar to the previous code involving RE2::Arg and then you shouldn't need to use goto anymore.

Yuki Kimoto

unread,
Jul 21, 2022, 4:27:29 AM7/21/22
to Paul Wankadia, re2-dev
On Thu, Jul 21, 2022 at 4:25 PM Yuki Kimoto <kimot...@gmail.com> wrote:

I delete RE2 object in the destructor(DESTROY method, a SPVM feature). This didn't work well.

If SPVM__Regex__compile() returns here or this call to new_pointer_by_name() returns an error without having stored the pointer, then the RE2 object will be leaked.


Because I can't write the code submatch is freed automatically using unique/shared pointer, I write the code at the end of the function.

I suggest using std::vector<re2::StringPiece> for submatch. That should look quite similar to the previous code involving RE2::Arg and then you shouldn't need to use goto anymore.


Paul Wankadia

unread,
Jul 21, 2022, 8:03:06 AM7/21/22
to Yuki Kimoto, re2-dev
Cool. :)

It's probably worth running the SPVM test suites under sanitizers; see https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html and the various -fsanitize= flags.

Yuki Kimoto

unread,
Jul 21, 2022, 7:02:33 PM7/21/22
to Paul Wankadia, re2-dev

Sorry, what are sanitizers? I have not used these yet in my life.



2022年7月21日(木) 21:03 Paul Wankadia <jun...@google.com>:

Paul Wankadia

unread,
Jul 22, 2022, 1:59:25 AM7/22/22
to Yuki Kimoto, re2-dev
On Fri, Jul 22, 2022 at 9:02 AM Yuki Kimoto <kimot...@gmail.com> wrote:

Sorry, what are sanitizers? I have not used these yet in my life.

They detect various classes of bugs via runtime instrumentation:

"Enable AddressSanitizer, a fast memory error detector. Memory access instructions are instrumented to detect out-of-bounds and use-after-free bugs."

"Enable ThreadSanitizer, a fast data race detector. Memory access instructions are instrumented to detect data race bugs."

"Enable LeakSanitizer, a memory leak detector. This option only matters for linking of executables and the executable is linked against a library that overrides malloc and other allocator functions."

"Enable UndefinedBehaviorSanitizer, a fast undefined behavior detector. Various computations are instrumented to detect undefined behavior at runtime."

Note that undefined behavior (https://en.cppreference.com/w/cpp/language/ub) makes a program incorrect from the compiler's perspective and so GIGO ("garbage in, garbage out") applies... with quite astonishing results sometimes. (My favourite example is when both branches of an if are executed.)

Yuki Kimoto

unread,
Jul 22, 2022, 5:10:57 AM7/22/22
to Paul Wankadia, re2-dev
Thanks.

I will try it next week.

These tests are personal tests on only gcc, aren't they?

Personal tests means the tests are not automatically tests that are executed at the user's environment.



2022年7月22日(金) 14:59 Paul Wankadia <jun...@google.com>:

Paul Wankadia

unread,
Jul 25, 2022, 10:03:37 AM7/25/22
to Yuki Kimoto, re2-dev
On Fri, Jul 22, 2022 at 7:10 PM Yuki Kimoto <kimot...@gmail.com> wrote:

These tests are personal tests on only gcc, aren't they?

FWIW, Clang also supports sanitizers.

Personal tests means the tests are not automatically tests that are executed at the user's environment.

Indeed, it would generally suffice to specify -fsanitize= only in the build flags for the testing done by the project itself (typically via the project's CI such as GitHub Actions).

Yuki Kimoto

unread,
Jul 25, 2022, 6:09:43 PM7/25/22
to Paul Wankadia, re2-dev
Thanks! I will try it.


2022年7月25日(月) 23:03 Paul Wankadia <jun...@google.com>:

Yuki Kimoto

unread,
Jul 31, 2022, 7:37:53 PM7/31/22
to Paul Wankadia, re2-dev
Paul Wankadia

Sorry, I found some bugs of SPVM and SPVM::Regex. I should fix and enhance them.

I will return to use sanitizer option after this work.

2022年7月26日(火) 7:09 Yuki Kimoto <kimot...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages