Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

QuoteMeta in PCRE2?

23 views
Skip to first unread message

Felipe Gasper

unread,
Dec 13, 2023, 2:20:46 PM12/13/23
to pcre...@googlegroups.com
Hello,

Was QuoteMeta’s omission from PCRE2 intentional? I’m looking for logic to escape a string for inclusion into a PCRE2 regexp, and while it seems safe just to copy-paste the old RE::QuoteMeta from PCRE1, first I’m curious why the library no longer includes this logic.

Thank you!

cheers,
-Felipe Gasper
Mississauga, ON

Ze'ev Atlas

unread,
Dec 13, 2023, 7:34:27 PM12/13/23
to Felipe Gasper, 'Felipe Gasper' via PCRE2 discussion list
I do not know Phil's reasons, so we have to wait for him or Zoltan to answer, but could you just test to make sure that the PCRE1 module works as advertised and we could try to just integrate it
--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.

Philip Hazel

unread,
Dec 14, 2023, 4:17:01 AM12/14/23
to Felipe Gasper, pcre...@googlegroups.com
I believe you are referring to something that was part of the C++ wrapper for PCRE1. This was not carried over to PCRE2 because by that time there was nobody maintaining it. I am not a C++ person. The wrapper was contributed by Google, and maintained by them, but then their people who were involved moved on or left. When PCRE2 was being created there was nobody who expressed an interest in porting the wrapper. There was also another reason for its omission. In the years since it was created, people had pointed out that there were several possible ways of mapping the PCRE functions onto C++ (as I am not a C++ person I can't give more details). So it seemed better to leave any C++ wrapping of PCRE2 for others to implement.

Philip


Ze'ev Atlas

unread,
Dec 14, 2023, 10:24:32 PM12/14/23
to philip...@gmail.com, Felipe Gasper, pcre...@googlegroups.com
Hi Felipe
In that case, could you please look at the original code on PCRE1 and see whether it is feasible to isolate the quotemeta function.itself.  Perhaps it does not rely on too much C++ mumbo jumbo and we could convert it into C.
I may or may not habe access to the PCRE1 Code, but it may be interesting and cost efective to do it.
Are there other "missing" finctions

Giuseppe D'Angelo

unread,
Dec 15, 2023, 4:12:39 AM12/15/23
to Ze'ev Atlas, philip...@gmail.com, Felipe Gasper, pcre...@googlegroups.com
On Fri, 15 Dec 2023 at 04:24, 'Ze'ev Atlas' via PCRE2 discussion list
<pcre...@googlegroups.com> wrote:
>
> Hi Felipe
> In that case, could you please look at the original code on PCRE1 and see whether it is feasible to isolate the quotemeta function.itself. Perhaps it does not rely on too much C++ mumbo jumbo and we could convert it into C.
> I may or may not habe access to the PCRE1 Code, but it may be interesting and cost efective to do it.
> Are there other "missing" finctions

The logic behind perl's quotemeta and QRegularExpression::escape (and
PCRE1's QuoteMeta, I assume) is extremely straightforward: iterate on
all code points, and if they're outside of the [a-zA-Z0-9_] set,
escape them with a backslash.

I've implemented from scratch in 2012 [1], it's 20 lines that have
never changed since.

Complications:
1) NUL must be escaped with "\\0" (backslash + 0) and not "\\\0"
(backslash + NUL), because pcre_compile uses a NUL-terminated string.

2) Handling of Unicode, specifically multiple code-unit sequences (in
UTF8/16). Only the first code unit needs a backslash in front of it,
the others need to be copied as-is. How to determine if it's the first
is left to you, there's no proper facilities in C for this. (Of course
it's trivial to do, I'm just ranting :-)). Also, you get to pick what
happens in case of illegal sequences. In Qt I do garbage in/garbage
out.

3) Memory management in C. In C++ we use `string` (or equivalent),
store the quoted string into it, return it, and call it a day. In C
you need the usual dance -- have the caller pass you output pointer +
size, be sure not to overflow it, return how much you would actually
need to write... the usual drill. Luckily, there's always an upper
bound the user can always rely upon (and you can document it): at
most, you'll write twice the size of the input string.


[1] https://codereview.qt-project.org/c/qt/qtbase/+/12319/21/src/corelib/tools/qregularexpression.cpp#1437

Hope this helps,
--
Giuseppe D'Angelo
Reply all
Reply to author
Forward
0 new messages