Replacing regex library with PCRE

358 views
Skip to first unread message

Vadim Zeitlin

unread,
Jul 9, 2021, 11:54:16 AM7/9/21
to wx-dev
Hello,

I've been thinking since a very long time about replacing the regex
library we use with PCRE (actually PCRE2 by now, but the difference between
PCRE and PCRE2 doesn't really matter, so I'm going to continue call it just
PCRE) and I'd like to finally either do it or decide that we're not going
to do it at all because this change should be done before 3.2.0 and would
appreciate your thoughts about this.

First, let me give the reasons I think we should do it:

0. Our library (*.c files in src/regex) is a custom version of Henry
Spencer's original regex which is not maintained since 15+ years and
almost certainly never will be maintained by anybody. Of course, maybe
we are never going to need to touch it, but if we ever would need to
make any non-trivial changes due to e.g. porting to a new architecture,
I have no idea how would we do it. PCRE is actively developed, used by
tons of important projects and generally seems like a much safer
dependency.

1. PCRE RE syntax is much richer than even the advanced syntax of the
current library. Just being able to use named captures would be
already great, but PCRE has many more features that go beyond
convenience of writing REs, e.g. "\K" escape sequence and much more.

2. PCRE is faster. The exact numbers depend on the RE and the input, but
speed does matter, especially as it's the main reason why I still would
even use wxRegEx rather than std::regex nowadays (more about this
below).

3. We could use PCRE system library, instead of always compiling our own
regex code, as we have to do now because we can never use the system
library in Unicode build.


Of course, there are also good reasons for not doing it:

0. This is one more thing to do before 3.2.0, when releasing it should be
the highest priority. OTOH I've already done quite a bit of work on it
in the past (i.e. I have a working and passing tests wxRegEx version
based on PCRE), so it shouldn't take that much time.

1. This change will be at least somewhat backwards incompatible because
PCRE syntax is just not the same. AFAICS the changes should affect only
particularly weird REs or very uncommon RE features (did anybody else
even know about directors, introduced by leading "***:", in our regex
engine? I've only learnt about them after looking at the test suite),
but it's still a possible silent breakage. The only way to avoid it
would be to keep the existing wxRegEx implementation and add a parallel
wxPCRegEx or something like this using PCRE, but I don't think it's a
good idea, i.e. that it's better than just doing nothing at all.

2. Spending any time on wxRegEx might be seen as a waste of time, now that
std::regex exists. However in practice there are still reasons to use
wxRegEx, the main of them being speed. std::regex is notoriously slow
and in a real-life program that I care about PCRE-based version is ~15
times faster than std::regex one. And this is with the latest and best
performing std::regex from gcc 11 only, previously the difference used
to be more like 150 times and it's still 140 right now with the latest
libc++ version used by clang-12. So using PCRE is still a very good idea
if you care about performance at all, especially in a program working on
multiple platforms.

3. If we want to use the system library, we'd depend on libpcre2-32, not on
libpcre2-8 that mostly everybody else uses (somewhat surprisingly, Qt
uses libpcre2-16) in default Unicode build, which is much less likely to
be installed. But at least it does exist. And UTF-8 build could use the
more common library.


Does anybody else see any reasons for [not] switching to PCRE? Right now
I'm conflicted because on one hand I have very little time for anything
wx-related and it seems wasteful to spend it on this because it's probably
not that important, one way or the other, but OTOH I still think it would
be an improvement and I also want to finish, rather than throw away, the
work I had already done on this. Which is, admittedly, not a very good
reason to do it, but I can't deny that it plays a role.

What do you think?
VZ

wsu

unread,
Jul 9, 2021, 10:51:10 PM7/9/21
to wx-dev
For me, the point of wx is platform independence.  Since there are already well-known platform-independent regex libraries, having wx provide one is redundant.

Therefore, I think the question should be:  Is wxRegEx mainly for internal use, or for library clients?  If it's for internal use, switch to a maintained library.  If it's for clients, remove it.  Yes, it's a breaking change, but at least it's not a subtle one, which switching to PCRE could be.  

utelle

unread,
Jul 10, 2021, 4:46:50 AM7/10/21
to wx-dev
Hi Vadim,

although I'm not a heavy user of regular expression, in the cases where I needed them I was happy that wxWidgets provided a wxRegEx class sparing me from adding just another dependency. A couple of times I was on the verge of using PCRE2 instead of wxRegEx due to performance and/or available features. Therefore I'm all in favor of replacing the underlying RE implementation of wxWidgets by PCRE2.

IMHO wxRegEx should be modernized, but should not be removed from wxWidgets. In the latter aspect I disagree with wsu. Based on PCRE2 wxRegEx will be a fully fledged RE implementation, and if I understood you correctly, you intend to use the PCRE2 system library on systems where it is available.

I always liked wxWidgets for providing support for most typical development needs within a single framework. No hazzle with tons of different separate library dependencies, and with finding suitable platform-independent libraries.

For sure there are candidates for removal from wxWidgets, like wxHTTP or wxFTP, due to their limited functional efficiency. But wxRegEx is not one of them.

Regards,
Ulrich

Vadim Zeitlin

unread,
Jul 10, 2021, 7:14:20 AM7/10/21
to wx-...@googlegroups.com
On Fri, 9 Jul 2021 19:51:10 -0700 (PDT) wsu wrote:

w> For me, the point of wx is platform independence. Since there are already
w> well-known platform-independent regex libraries, having wx provide one is
w> redundant.

Yes, I agree, I probably wouldn't add wxRegEx to wx today, but the
question is not about whether to add it, but about what to do with the
existing class.

w> Therefore, I think the question should be: Is wxRegEx mainly for internal
w> use, or for library clients? If it's for internal use, switch to a
w> maintained library. If it's for clients, remove it.

We just don't do this unless we have a truly excellent reason to. In this
case there is no reason to break the existing code using it.

w> Yes, it's a breaking change, but at least it's not a subtle one, which
w> switching to PCRE could be.

So the choice is between making this (silent) change and doing nothing at
all. Normally I'd again agree that silently breaking existing code is
unacceptable, but in this case it's more complicated as the vast majority
of the common regexes will continue to work just fine. But yes, there are
some differences. I couldn't find any code using wxRegEx that might be
affected by them in the usual places (GitHub, Debian code search), but this
doesn't mean anything, of course.

The trouble is that if don't make this change now, we commit to keeping
using TCL regex library and syntax for 3.2 lifetime, i.e. several more
years at the very least. And I am not sure we want to do this.

Regards,
VZ

wsu

unread,
Jul 10, 2021, 4:14:52 PM7/10/21
to wx-dev
On Saturday, July 10, 2021 at 7:14:20 AM UTC-4 VZ wrote:
On Fri, 9 Jul 2021 19:51:10 -0700 (PDT) wsu wrote:

w> Therefore, I think the question should be: Is wxRegEx mainly for internal
w> use, or for library clients? If it's for internal use, switch to a
w> maintained library. If it's for clients, remove it.

We just don't do this unless we have a truly excellent reason to. In this
case there is no reason to break the existing code using it.



I'm not surprised, but you did ask our opinion.

 
w> Yes, it's a breaking change, but at least it's not a subtle one, which
w> switching to PCRE could be.

So the choice is between making this (silent) change and doing nothing at
all. Normally I'd again agree that silently breaking existing code is
unacceptable, but in this case it's more complicated as the vast majority
of the common regexes will continue to work just fine. But yes, there are
some differences. I couldn't find any code using wxRegEx that might be
affected by them in the usual places (GitHub, Debian code search), but this
doesn't mean anything, of course.

The trouble is that if don't make this change now, we commit to keeping
using TCL regex library and syntax for 3.2 lifetime, i.e. several more
years at the very least. And I am not sure we want to do this.



I consider using unmaintained code a severe risk.  Since I was already advocating for breaking changes (removing wxRegEx entirely) in preference to keeping unmaintained code, my second choice is the breaking change of switching to PCRE.


 
Regards,
VZ

Vadim Zeitlin

unread,
Jul 10, 2021, 5:20:58 PM7/10/21
to wx-...@googlegroups.com
On Sat, 10 Jul 2021 13:14:52 -0700 (PDT) wsu wrote:

w> On Saturday, July 10, 2021 at 7:14:20 AM UTC-4 VZ wrote:
w>
w> > On Fri, 9 Jul 2021 19:51:10 -0700 (PDT) wsu wrote:
w> >
w> > w> Therefore, I think the question should be: Is wxRegEx mainly for
w> > internal
w> > w> use, or for library clients? If it's for internal use, switch to a
w> > w> maintained library. If it's for clients, remove it.
w> >
w> > We just don't do this unless we have a truly excellent reason to. In this
w> > case there is no reason to break the existing code using it.
w>
w> I'm not surprised, but you did ask our opinion.

Yes, and I do appreciate all answers, thank you, but, still, we're not
going to just remove wxRegEx, preventing people using it from upgrading to
wx 3.2. At the very least, we'd need to deprecate it first.

w> I consider using unmaintained code a severe risk. Since I was already
w> advocating for breaking changes (removing wxRegEx entirely) in preference
w> to keeping unmaintained code, my second choice is the breaking change of
w> switching to PCRE.

Let's see if there are any objections to doing this...

Thanks again,
VZ

redtide

unread,
Jul 11, 2021, 6:05:37 AM7/11/21
to wx-dev
I think I never used it. There was some "wxMania" of anything in the past, then I learned (from you) that the less is better for a library and there are other ones doing a better job out there talking about non UI related ones. From this concept if you ask me and it was possible not only I would remove it, wxHTTP, wxFTP but even others like wxXML, but I can't imagine what consequence would be (deprecating them first of course).
Would it be possible to make it an option to build PCRE statically like PNG, JPEG and others if you decide to use it?

Vadim Zeitlin

unread,
Jul 11, 2021, 7:04:15 AM7/11/21
to wx-...@googlegroups.com
On Sun, 11 Jul 2021 03:05:37 -0700 (PDT) redtide wrote:

r> I think I never used it. There was some "wxMania" of anything in the past,
r> then I learned (from you) that the less is better for a library and there
r> are other ones doing a better job out there talking about non UI related
r> ones. From this concept if you ask me and it was possible not only I would
r> remove it, wxHTTP, wxFTP

We should consider deprecating those after 3.2, but the real problem is
not those, but wxSocket itself. Unfortunately removing it would probably
break a lot of code, as it's used in many, many different projects, and
replacing it with something else is not simple due to its idiosyncratic
behaviour.

r> but even others like wxXML,

This one is used for XRC loading, so it will stay.

r> Would it be possible to make it an option to build PCRE statically like
r> PNG, JPEG and others if you decide to use it?

Yes, it will be handled just as the other 3rd party libraries. I
considered making this more flexible, but the conclusion of the thread
https://groups.google.com/g/wx-dev/c/7vN792GRyw8/m/arEcfKpbAQAJ was pretty
unequivocal and we're not going to change anything here.

Regards,
VZ

Teodor Petrov

unread,
Jul 11, 2021, 2:08:39 PM7/11/21
to wx-...@googlegroups.com
On 7/9/21 6:54 PM, Vadim Zeitlin wrote:
> ... I'd like to finally either do it or decide that we're not going
> to do it at all because this change should be done before 3.2.0 and would
> appreciate your thoughts about this...

Off-topic: I think the actual problem here is that the release cycle of
wxWidgets doesn't match development practices/needs of 2021.
It also doesn't match the time the wxwidgets developers can dedicate to
the project.

I think you should first adjust that.
I think you should drop this feature-based-release style and switch to
time-based release style.
I think you should release:
1. a stable version every 3 or 6 months - these releases should contain
whatever is backported to the stable branch.
2. a new stable branch should be made every 1 or 2 years, just create a
branch with whatever has landed in master, stabilize it for a month and
release.

Benefits:
1. new code lands in peoples hands more often. Code::Blocks suffered a
lot in the 2.8 to 3.0 transition. We started it late and I think we
still have some regressions, because there were a lot of silent
(run-time) only breakages (most of them caused by our incorrect api use).
2. you can do api/abi breaking decisions more often.
3. deprecating stuff and then removing it could happen more often. Or
the decision could be reverted faster.

Problems:
1. you'll have to land fewer or incomplete features sometimes
2. you'll have to do more work doing releases, but you still do it for
the 3.1.x releases at the moment... you've already done 5 since 3.0
landing, so I suppose it will be the same.

And for downstream projects this is already how we operate - we're using
the unstable branch where we can - MSW and Cocoa.
But there is that added burden of linux support where we're stuck with
the stable branch.
Sure we'll be stuck on older stable releases in the new scheme...

I'm just a bystander watching how every target release day is missed
with years. :)

end off-topic;

On topic:
Code::Blocks is using wxRegEx quite a lot, if you give me a branch I
could probably do some testing.
A speed boost would be welcome also.
I suppose it will be best if you switch to PCRE, but provide some API
and env vars which allow the user/application to disable it.
I'm not sure it makes sense to have a per use flag for
disabling/enabling it or the api provides just global control.

Best regards,
Teodor

Kenneth Porter

unread,
Jul 11, 2021, 3:22:58 PM7/11/21
to wx-...@googlegroups.com
--On Sunday, July 11, 2021 2:04 PM +0200 Vadim Zeitlin
<va...@wxwidgets.org> wrote:

> We should consider deprecating those after 3.2, but the real problem is
> not those, but wxSocket itself. Unfortunately removing it would probably
> break a lot of code, as it's used in many, many different projects, and
> replacing it with something else is not simple due to its idiosyncratic
> behaviour.

This just crossed my feed. The APIs are Apple-specific but the concepts
might be of interest to wx users who want their apps to be responsive. QUIC
libraries are now available for a number of platforms and new code should
consider using them.

<https://developer.apple.com/videos/play/wwdc2021/10239/>

You can find a list of libraries here:

<https://en.wikipedia.org/wiki/QUIC>

The ideas have spawned a new mailing list for discussing "round trips per
minute" (RPM), a new metric for measuring network latency and hence user
experience.

<https://lists.bufferbloat.net/listinfo/rpm>

Vadim Zeitlin

unread,
Jul 11, 2021, 4:04:39 PM7/11/21
to wx-...@googlegroups.com
On Sun, 11 Jul 2021 21:07:27 +0300 Teodor Petrov wrote:

TP> On 7/9/21 6:54 PM, Vadim Zeitlin wrote:
TP> > ... I'd like to finally either do it or decide that we're not going
TP> > to do it at all because this change should be done before 3.2.0 and would
TP> > appreciate your thoughts about this...
TP>
TP> Off-topic: I think the actual problem here is that the release cycle of
TP> wxWidgets doesn't match development practices/needs of 2021.

This is indeed a different and vast topic. To be brief, the problem is
that wx hasn't been written with ABI in mind, and I don't see any realistic
solution to this. Having multiple ABI-incompatible "stable" release won't
help. We should re-discuss this, but after 3.2.0, we're not going to change
anything drastically until it.

TP> On topic:
TP> Code::Blocks is using wxRegEx quite a lot, if you give me a branch I
TP> could probably do some testing.

This would be great, TIA!

TP> A speed boost would be welcome also.
TP> I suppose it will be best if you switch to PCRE, but provide some API
TP> and env vars which allow the user/application to disable it.

I considered this, but I'd like to avoid it. The goal is, after all, to
get rid of the old and unmaintained code, not to add to it. So, if
possible, I'd like to replace it with PCRE, not provide a PCRE-based
alternative.

Did you ever use any features specific to our regex library, such as the
previously mentioned directors or any other unusual features documented in
https://www.tcl.tk/man/tcl8.2.3/TclCmd/re_syntax.html ?

TP> I'm not sure it makes sense to have a per use flag for
TP> disabling/enabling it or the api provides just global control.

If we must have it, it would be a build option, not something in the API.
But, again, I hope we can avoid it to keep things simpler.

Regards,
VZ

Teodor Petrov

unread,
Jul 11, 2021, 4:52:01 PM7/11/21
to wx-...@googlegroups.com
On 7/11/21 11:04 PM, Vadim Zeitlin wrote:
> This is indeed a different and vast topic. To be brief, the problem is
> that wx hasn't been written with ABI in mind, and I don't see any realistic
> solution to this. Having multiple ABI-incompatible "stable" release won't
> help. We should re-discuss this, but after 3.2.0, we're not going to change
> anything drastically until it.

<shrug>


> I considered this, but I'd like to avoid it. The goal is, after all, to
> get rid of the old and unmaintained code, not to add to it. So, if
> possible, I'd like to replace it with PCRE, not provide a PCRE-based
> alternative.

Mark the APIs for enabling the old mode to be deprecated and drop them
by 3.4.0.
Default to the new mode...

> Did you ever use any features specific to our regex library, such as the
> previously mentioned directors or any other unusual features documented in
> https://www.tcl.tk/man/tcl8.2.3/TclCmd/re_syntax.html ?

No, idea. Hyrum's law is in effect probably.
But we can adjust these. Some of our regex uses are in user controlled
config files.
Find in files has access to the regex engine, but I'm not sure which
mode we use.
We use regexes. :)

> If we must have it, it would be a build option, not something in the API.
> But, again, I hope we can avoid it to keep things simpler.

The per app/env would be required on linux where you have a single
library and many applications using it.

/Teodor

Eran Ifrah

unread,
Jul 13, 2021, 12:56:00 PM7/13/21
to wx-...@googlegroups.com
My 2 cents:

- Any move to PCRE over the current, unmaintained regex library is a welcomed change (and appreciated)
- You mentioned that some regex pattern will get broken, which I believe is probably unacceptable (I have probably hundreds of regex objects in CodeLite code base, testing them all is a non realistic task), so, I think that a new wxRegExPCRE class is the way to go (as you suggested)
- I would also add a wxDEPRECATED to the wxRegEx class (by "we" I mean you :D) 

Thanks!




--
You received this message because you are subscribed to the Google Groups "wx-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wx-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wx-dev/2a37d203-60b3-fbdc-9033-c65fab3f8c4a%40gmail.com.


--

Eran Ifrah
Author of CodeLite IDE https://codelite.org

Vadim Zeitlin

unread,
Jul 13, 2021, 1:15:04 PM7/13/21
to wx-...@googlegroups.com
On Tue, 13 Jul 2021 19:55:46 +0300 Eran Ifrah wrote:

EI> - You mentioned that some regex pattern will get broken, which I believe is
EI> probably unacceptable

I would normally say it too, but RE that would be broken by this should be
really, really rare -- so rare that they might even be non-existent. They
do occur in the current test suite which checks for the various special
cases, but I think the probability of them appearing in the wild is very
low.

E.g. I consider myself to be reasonably fluent in regexish, but I have to
admit that I still don't know what behaviour would I expect from the regex
/(week|wee)(night|knights)/ applied to the string "weeknights", which is
one of the few test cases where the behaviour differs. And I have a lot of
trouble believing somebody would actually rely on the current behaviour
(which matches the second alternatives, rather than the first ones as
PCRE does) in practice.

Other incompatibilities are, admittedly, less confusing, but still should
be very rare, I think:

- POSIX collating elements "[." and "[:" are not supported in PCRE, but
I couldn't find anybody actually using them.
- "\b" and "\B" are boundary assertions in PCRE, but escape sequences for
backspace and backslash currently.
- "\Uxxxxxxxx" is not supported by PCRE (much more common "\uxxxx" is).
- "\xNNNN" is not supported by PCRE, only "\xNN" is.
- Switching to "extended" mode inside the regex using "(?e)" doesn't work,
as we're always in the same ("advanced") mode with PCRE, but this is
again something I don't think anybody does.

EI> (I have probably hundreds of regex objects in CodeLite code base,
EI> testing them all is a non realistic task),

Do you think you use any of the constructs above in any of them? If you
do, this would be a sufficient argument for preserving the current library.
If you don't, it doesn't prove anything, of course, but it would still be a
useful data point...

EI> so, I think that a new wxRegExPCRE class is the way to go

The trouble with this is that having to still keep the current regex
library kills half of the motivation for doing this in the first place.

Regards,
VZ

Vadim Zeitlin

unread,
Jul 18, 2021, 6:34:47 PM7/18/21
to wx-...@googlegroups.com
On Sat, 10 Jul 2021 01:46:50 -0700 (PDT) utelle wrote:

u> Hi Vadim,
u>
u> although I'm not a heavy user of regular expression, in the cases where I
u> needed them I was happy that wxWidgets provided a wxRegEx class sparing me
u> from adding just another dependency. A couple of times I was on the verge
u> of using PCRE2 instead of wxRegEx due to performance and/or available
u> features. Therefore I'm all in favor of replacing the underlying RE
u> implementation of wxWidgets by PCRE2.

Hello again,

There is now https://github.com/wxWidgets/wxWidgets/pull/2438 which does
this. Any comments and, especially, testing would be very welcome!

u> IMHO wxRegEx should be modernized, but should not be removed from
u> wxWidgets. In the latter aspect I disagree with wsu. Based on PCRE2 wxRegEx
u> will be a fully fledged RE implementation, and if I understood you
u> correctly, you intend to use the PCRE2 system library on systems where it
u> is available.

Yes, it will be used if available. Note that in Unicode build you would
need libpcre2-32 rather than the more common libpcre2-8, but at least it is
available in the distribution repositories and is already used by some
other packages (on my Debian system its reverse dependencies show fish and
godot3).

Regards,
VZ

Vadim Zeitlin

unread,
Jul 18, 2021, 6:38:07 PM7/18/21
to wx-...@googlegroups.com
On Sun, 11 Jul 2021 23:50:49 +0300 Teodor Petrov wrote:

TP> Mark the APIs for enabling the old mode to be deprecated and drop them
TP> by 3.4.0.
TP> Default to the new mode...

I did try to do it like this, but it turned out to be too difficult: we
can't choose between the old library and PCRE2 when building it ourselves
using make- or project files or, at least, not without some really horrible
hacks. So, finally, https://github.com/wxWidgets/wxWidgets/pull/2438 simply
replaces the old library with the new one. If we decide that it's too big
of a break, we probably should just forget about switching to PCRE2 and
stay with the current library because even defaulting to the new regex
engine would be a bad idea in this case. I'm rather hopeful that most
people won't even notice this change.

TP> > Did you ever use any features specific to our regex library, such as the
TP> > previously mentioned directors or any other unusual features documented in
TP> > https://www.tcl.tk/man/tcl8.2.3/TclCmd/re_syntax.html ?
TP>
TP> No, idea. Hyrum's law is in effect probably.

Sure, but do you even _know_ about them? I didn't, so I'm pretty sure I
had never used them personally.

Regards,
VZ

utelle

unread,
Jul 20, 2021, 10:45:15 AM7/20/21
to wx-dev
Thanks for your efforts. I'll take a look within the next couple of days.

Regards,
Ulrich

utelle

unread,
Jul 22, 2021, 10:38:45 AM7/22/21
to wx-dev
I successfully compiled wxWidgets with the PCRE based regex library. My own applications using wxRegEx work flawlessly. That is, they are not affected by any differences between PCRE and the previous regex library.

Kind regards,
Ulrich

Vadim Zeitlin

unread,
Jul 23, 2021, 10:17:01 AM7/23/21
to wx-...@googlegroups.com
On Thu, 22 Jul 2021 07:38:44 -0700 (PDT) utelle wrote:

u> I successfully compiled wxWidgets with the PCRE based regex library. My own
u> applications using wxRegEx work flawlessly. That is, they are not affected
u> by any differences between PCRE and the previous regex library.

Thanks for testing, Ulrich!

If there are no objections by Monday, I'll merge this PR into master. I
realize that some people might not have had time to test it yet, and it
wouldn't be a problem to wait more if anybody needs more time, but OTOH I
don't want to delay merging this indefinitely neither.

Thanks again,
VZ

David Connet

unread,
Aug 1, 2021, 12:33:01 PM8/1/21
to wx-...@googlegroups.com
On 7/23/2021 7:16 AM, Vadim Zeitlin wrote:
> If there are no objections by Monday, I'll merge this PR into master. I
> realize that some people might not have had time to test it yet, and it
> wouldn't be a problem to wait more if anybody needs more time, but OTOH I
> don't want to delay merging this indefinitely neither.

I just tried building this on my unix system with configure and it's
failing badly on pcre. It looks like a bunch of the configure files were
checked in as DOS.

Hm. Just changed pcre's configure, config.guess, config.sub to unix. And
now I'm running into that with tiff also.

And expat.

This is on a clean clone too.
git - checkout ...
git submodule init
git submodule update --recursive

Dave

Vadim Zeitlin

unread,
Aug 1, 2021, 12:45:44 PM8/1/21
to wx-...@googlegroups.com
On Sun, 1 Aug 2021 09:32:52 -0700 David Connet wrote:

DC> On 7/23/2021 7:16 AM, Vadim Zeitlin wrote:
DC> > If there are no objections by Monday, I'll merge this PR into master. I
DC> > realize that some people might not have had time to test it yet, and it
DC> > wouldn't be a problem to wait more if anybody needs more time, but OTOH I
DC> > don't want to delay merging this indefinitely neither.
DC>
DC> I just tried building this on my unix system with configure and it's
DC> failing badly on pcre. It looks like a bunch of the configure files were
DC> checked in as DOS.
DC>
DC> Hm. Just changed pcre's configure, config.guess, config.sub to unix. And
DC> now I'm running into that with tiff also.
DC>
DC> And expat.
DC>
DC> This is on a clean clone too.
DC> git - checkout ...
DC> git submodule init
DC> git submodule update --recursive

I think you must have some non-default Git option set because these files
definitely have Unix EOLs in the repository. I don't know much about these
options because I never used them, but I'd start by checking What does

$ git config core.autocrlf

show for you and changing it if it's anything other than "false".

Regards,
VZ

David Connet

unread,
Aug 1, 2021, 1:00:53 PM8/1/21
to wx-...@googlegroups.com
On 8/1/2021 9:45 AM, Vadim Zeitlin wrote:
> I think you must have some non-default Git option set because these files
> definitely have Unix EOLs in the repository. I don't know much about these
> options because I never used them, but I'd start by checking What does
>
> $ git config core.autocrlf
>
> show for you and changing it if it's anything other than "false".
>
> Regards,
> VZ

Ah bingo! It was true. (Oh, how I just so love git and EOLs...)

I'm recloning now...

(Just ran my build on a different machine and it worked fine)

Thx!!
Dave

Kenneth Porter

unread,
Aug 1, 2021, 3:45:19 PM8/1/21
to wx-...@googlegroups.com
--On Sunday, August 01, 2021 11:00 AM -0700 David Connet
<dc...@agilityrecordbook.com> wrote:

> (Oh, how I just so love git and EOLs...)

Different EOL conventions are the bane of all cross-platform development. I
remember making sure that my text files were all properly marked in
Subversion, setting their MIME type in the file's repo properties.

But at least it's not as bad as the slash-backslash difference!

David Connet

unread,
Aug 1, 2021, 4:00:15 PM8/1/21
to wx-...@googlegroups.com
On 8/1/2021 10:00 AM, David Connet wrote:
> (Just ran my build on a different machine and it worked fine)


Sigh. Ubuntu...

Now, whenever I try to link anything that has a wxRegEx in it (I have
one project using wxSqlite) I get undefined references to
pcre2_match_data_free_32 (and others).

I've configured with "--with-regex=builtin" (no change from BP (before
pcre)).

To rule out wxsqlite, I simply added

> #include "wx/regex.h"
> ...
> wxRegEx x;
> x.IsValid();

to an project that was working, and it also broke with the undefined
reference.

Is there something else I need to do to get this working on Ubuntu? (It
works fine on Windows)

Dave

Mart Raudsepp

unread,
Aug 1, 2021, 4:35:50 PM8/1/21
to wx-...@googlegroups.com
Ühel kenal päeval, P, 01.08.2021 kell 13:00, kirjutas David Connet:
> On 8/1/2021 10:00 AM, David Connet wrote:
> > (Just ran my build on a different machine and it worked fine)
>
>
> Sigh. Ubuntu...
>
> one project using wxSqlite) I get undefined references to
> pcre2_match_data_free_32 (and others).
>
> I've configured with "--with-regex=builtin" (no change from BP
> (before pcre)).
>
> To rule out wxsqlite, I simply added
>
>  > #include "wx/regex.h"
>  > ...
>  > wxRegEx x;
>  > x.IsValid();
>
> to an project that was working, and it also broke with the undefined
> reference.
>
> Is there something else I need to do to get this working on Ubuntu?

I understood you need the 32bit libpcre2 version, which in Ubuntu seems
to be packages as "libpcre2-32-0".
But then why would it get that far as to complain about undefined
references, instead of failing to find the libpcre2-32.so.0 library..
Maybe something in wx build configuration step being wrongly happy if
you didn't have that package?

Mart

David Connet

unread,
Aug 1, 2021, 4:59:22 PM8/1/21
to wx-...@googlegroups.com, Mart Raudsepp
On 8/1/2021 1:35 PM, Mart Raudsepp wrote:
>
>> Sigh. Ubuntu...
>>
>> one project using wxSqlite) I get undefined references to
>> pcre2_match_data_free_32 (and others).
>>
>> I've configured with "--with-regex=builtin" (no change from BP
>> (before pcre)).
>>
>> To rule out wxsqlite, I simply added
>>
>>  > #include "wx/regex.h"
>>  > ...
>>  > wxRegEx x;
>>  > x.IsValid();
>>
>> to an project that was working, and it also broke with the undefined
>> reference.
>>
>> Is there something else I need to do to get this working on Ubuntu?
> I understood you need the 32bit libpcre2 version, which in Ubuntu seems
> to be packages as "libpcre2-32-0".
> But then why would it get that far as to complain about undefined
> references, instead of failing to find the libpcre2-32.so.0 library..
> Maybe something in wx build configuration step being wrongly happy if
> you didn't have that package?
>
> Mart
>
Just checked, that's already installed. Plus, I told configure the use
the builtin, so I should be using the pcre that was just added in the
3rdparty directory.

Dave

David Connet

unread,
Aug 1, 2021, 5:11:03 PM8/1/21
to wx-...@googlegroups.com
On 8/1/2021 1:59 PM, David Connet wrote:
> Just checked, that's already installed. Plus, I told configure the use
> the builtin, so I should be using the pcre that was just added in the
> 3rdparty directory.
>
> Dave
>
I did forget to mention one thing... I compile with static libs.

Dave

Vadim Zeitlin

unread,
Aug 1, 2021, 5:43:12 PM8/1/21
to wx-...@googlegroups.com
On Sun, 1 Aug 2021 13:00:07 -0700 David Connet wrote:

DC> Now, whenever I try to link anything that has a wxRegEx in it (I have
DC> one project using wxSqlite) I get undefined references to
DC> pcre2_match_data_free_32 (and others).

This is unexpected, this function should be in libwxregexu-3.1.a which
should be linked in. How exactly are you linking your program?

DC> Is there something else I need to do to get this working on Ubuntu? (It
DC> works fine on Windows)

It's probably a bug, but I don't see what exactly is wrong. What configure
options exactly do you use? I've tried with --disable-shared --disable-sys-libs
but this worked just fine for me. Do you use any other important non
default configure options?

Thanks,
VZ

David Connet

unread,
Aug 1, 2021, 8:50:52 PM8/1/21
to wx-...@googlegroups.com
--disable-compat28
--disable-compat30
--disable-mediactrl
--disable-shared
--enable-unicode
--with-gtk=3
--disable-debug_flag (this is different when building release, fails in
both)
--with-cxx=14
--with-expat=builtin
--with-regex=builtin
--with-zlib=builtin
--without-libiconv
--without-liblzma

And env:

        export CXXFLAGS="-std=c++14"
        export OBJCXXFLAGS="-std=c++14"

Dave

David Connet

unread,
Aug 2, 2021, 10:55:19 AM8/2/21
to wx-...@googlegroups.com
Another thing - I'm not seeing -lwxregexu-3.1 in the "wx-config --libs"
output. (I use the output of wx-config in my makefile)

Ah - looks like that's the problem. When I changed my configure.in from:

AC_SUBST(LDFLAGS, "`wx-config --libs`")

to:

AC_SUBST(LDFLAGS, "`wx-config --libs` -lwxregexu-3.1")

that solved the problem.

Dave

Vadim Zeitlin

unread,
Aug 2, 2021, 12:12:58 PM8/2/21
to wx-...@googlegroups.com
On Mon, 2 Aug 2021 07:55:11 -0700 David Connet wrote:

DC> Another thing - I'm not seeing -lwxregexu-3.1 in the "wx-config --libs"
DC> output.

Thanks, this is where the problem was. I've fixed it now in
a4a65f16f6 (Fix linking with builtin regex library, 2021-08-02) even though
I still have no idea how could I miss it when testing this as I definitely
did test with built-in PCRE under Unix.

In any case, this should work with the latest master, thanks a lot for
finding this problem!

VZ

David Connet

unread,
Aug 2, 2021, 12:54:04 PM8/2/21
to wx-...@googlegroups.com
Thankx! I can confirm this fixed my problem!

Dave

Teodor Petrov

unread,
Aug 3, 2021, 2:45:13 PM8/3/21
to wx-...@googlegroups.com
On 7/11/21 11:50 PM, Teodor Petrov wrote:
>>   Did you ever use any features specific to our regex library, such
>> as the
>> previously mentioned directors or any other unusual features
>> documented in
>> https://www.tcl.tk/man/tcl8.2.3/TclCmd/re_syntax.html ?
>
> No, idea. Hyrum's law is in effect probably.

After a fast test these two fail to compile:
    m_RE_Unix.Compile(_T("([^$]|^)(\\$[({]?(#?[A-Za-z_0-9.]+)[)}
/\\]?)"), wxRE_EXTENDED | wxRE_NEWLINE);
    wxRegEx regexFortranArray(wxT("^\\([0-9,]+)$"));

No idea what they are trying to match actually.
There are no tests and the surrounding code gives very little clues. :(
I'm not even sure if these actually work. :(

And there is another bug - the error message printed in the log has 1
character truncated.

/Teodor

Vadim Zeitlin

unread,
Aug 3, 2021, 7:37:19 PM8/3/21
to wx-...@googlegroups.com
On Tue, 3 Aug 2021 21:44:44 +0300 Teodor Petrov wrote:

TP> On 7/11/21 11:50 PM, Teodor Petrov wrote:
TP> >>   Did you ever use any features specific to our regex library, such
TP> >> as the
TP> >> previously mentioned directors or any other unusual features
TP> >> documented in
TP> > No, idea. Hyrum's law is in effect probably.
TP>
TP> After a fast test these two fail to compile:

Thanks for checking!

TP>     m_RE_Unix.Compile(_T("([^$]|^)(\\$[({]?(#?[A-Za-z_0-9.]+)[)}/\\]?)"), wxRE_EXTENDED | wxRE_NEWLINE);

This one seems like a genuine problem: TCL regex library doesn't allow
escaping "]" with a backslash unless wxRE_ADVANCED is used, but PCRE always
does, so "\\]" doesn't match the opening "[", hence the error. This can be
easily fixed by using "\\\\]" instead, which would also work fine with TCL
regex, but this is still an(other) incompatibility. I don't know if we
should try fixing it up in wxRegEx itself... For now I've added this case
to the description of incompatibilities in wxRegEx documentation.

TP>     wxRegEx regexFortranArray(wxT("^\\([0-9,]+)$"));

This one is a bit more confusing as I'm not sure what is the intention
here, but there is still another incompatibility because TCL regex ignores
")" not preceded by (not escaped) "(" while PCRE complains about it. The
fix is to use "\\)", but it's still an incompatibility too, of course. I
don't see any reasonable possibility to fix it at wxRegEx level however, so
I've also just documented it.

TP> No idea what they are trying to match actually.
TP> There are no tests and the surrounding code gives very little clues. :(
TP> I'm not even sure if these actually work. :(
TP>
TP> And there is another bug - the error message printed in the log has 1
TP> character truncated.

Oops, this one is definitely a bug, fixed now in 232a3ab577 (Allocate more
space for the wxRegEx error message buffer, 2021-08-04), thanks!
VZ

Teodor Petrov

unread,
Aug 4, 2021, 2:39:54 PM8/4/21
to wx-...@googlegroups.com
Hi again,

Thanks for the explanations.

Some random thoughts:

1. I'm looking at the documentation:

"Using || to embed Unicode code points into the pattern is not supported
any more, use the still supported || , followed by exactly four
hexadecimal digits, or || , followed by exactly two hexadecimal digits,
instead."

The line above seems to have some missing stuff in it. If the rendering
is OK, then the explanation is not good and I have no idea what it is
trying to explain.

2. Double are in: "Other regexes syntactically invalid according to
POSIX are are re-interpreted "

3. One thing I've also realized today is that we need a way to detect if
wxRegEx is using PCRE at compile time. I'm pretty sure we'll have to add
ifdefs for older wx releases. I'm pretty sure we'll support 3.0.x for
quite some time. From the documentation it seems a check like
wxCHECK_VERSION(3, 1, 6) would be enough. Is this really the case?

4. If I were you I wouldn't be confident in the all-or-nothing change to
PCRE, given the length of "Other changes are:". I suppose time will tell.

/Teodor

Vadim Zeitlin

unread,
Aug 4, 2021, 4:20:57 PM8/4/21
to wx-...@googlegroups.com
On Wed, 4 Aug 2021 21:39:27 +0300 Teodor Petrov wrote:

TP> Some random thoughts:
TP>
TP> 1. I'm looking at the documentation:
TP>
TP> "Using || to embed Unicode code points into the pattern is not supported
TP> any more, use the still supported || , followed by exactly four
TP> hexadecimal digits, or || , followed by exactly two hexadecimal digits,
TP> instead."
TP>
TP> The line above seems to have some missing stuff in it. If the rendering
TP> is OK, then the explanation is not good and I have no idea what it is
TP> trying to explain.

Thanks for noticing this! It's indeed a formatting problem, due to
forgetting to double the backslashes, resulting in missing \U, \u and \x
respectively here. Hopefully the sentence is more understandable now that
I've fixed this.

TP> 2. Double are in: "Other regexes syntactically invalid according to
TP> POSIX are are re-interpreted "

Fixed too, thanks.

TP> 3. One thing I've also realized today is that we need a way to detect if
TP> wxRegEx is using PCRE at compile time. I'm pretty sure we'll have to add
TP> ifdefs for older wx releases. I'm pretty sure we'll support 3.0.x for
TP> quite some time. From the documentation it seems a check like
TP> wxCHECK_VERSION(3, 1, 6) would be enough. Is this really the case?

Yes. My hope is that you shouldn't need such checks if you use the regex
syntax common to TCL and PCRE, but if you have to, you can use this. But I
guess we could also define some wxHAS_PCRE if you think it's worth it.

TP> 4. If I were you I wouldn't be confident in the all-or-nothing change to
TP> PCRE, given the length of "Other changes are:". I suppose time will tell.

I'm not 100% confident, but I don't think it's reasonable to have a choice
between TCL and PCRE, i.e. either we should do what I've done, or nothing
at all and stay with the old library forever. And let's say that for now
I'm still more than 50% confident that the former is better than the
latter. But, as you say, I might regret it later...

Regards,
VZ

Stefan Csomor

unread,
Aug 11, 2021, 9:55:12 AM8/11/21
to wx-...@googlegroups.com
Hi

As the Xcode files have not been rebuilt and I don't have the time at the moment to do this, I'd just like to turn off PCRE for Xcode, but

#define wxUSE_PCRE 0

Results in undefined macros PCRE2_MAJOR, PCRE2_MINOR

I don't think that call existed pre-pcre, so what should it return in that case ?

Thanks,

Stefan

Vadim Zeitlin

unread,
Aug 11, 2021, 10:14:24 AM8/11/21
to wx-...@googlegroups.com
On Wed, 11 Aug 2021 13:55:09 +0000 Stefan Csomor wrote:

SC> As the Xcode files have not been rebuilt

Hi,

Sorry about this, but I'm really not on friendly terms with Xcode, so I
don't know how to do this. In principle, compiling PCRE should be trivial,
it's just a bunch of C files and there are wx-specific headers in src/wx
subdirectory that should be used when not using configure.

SC> I'd just like to turn off PCRE for Xcode, but
SC>
SC> #define wxUSE_PCRE 0

There is no such option, you should turn off wxUSE_REGEX to avoid using
PCRE.

SC> Results in undefined macros PCRE2_MAJOR, PCRE2_MINOR

Sorry, where does this happen?
VZ

Stefan Csomor

unread,
Aug 11, 2021, 12:02:29 PM8/11/21
to wx-...@googlegroups.com
Hi

Am 11.08.21, 16:14 schrieb "Vadim Zeitlin" <wx-...@googlegroups.com im Auftrag von va...@wxwidgets.org>:

On Wed, 11 Aug 2021 13:55:09 +0000 Stefan Csomor wrote:

SC> As the Xcode files have not been rebuilt

Hi,

Sorry about this, but I'm really not on friendly terms with Xcode, so I
don't know how to do this. In principle, compiling PCRE should be trivial,
it's just a bunch of C files and there are wx-specific headers in src/wx
subdirectory that should be used when not using configure.

ok, so these are not listed somewhere in a files list file ?

SC> I'd just like to turn off PCRE for Xcode, but
SC>
SC> #define wxUSE_PCRE 0

There is no such option, you should turn off wxUSE_REGEX to avoid using
PCRE.

Ok, I'll do this so that I can look at the rest first

SC> Results in undefined macros PCRE2_MAJOR, PCRE2_MINOR

Sorry, where does this happen?

When I just set wxUSE_PCRE from 1 to 0 in regex.cpp

Thanks,

Stefan

Vadim Zeitlin

unread,
Aug 11, 2021, 12:05:45 PM8/11/21
to wx-...@googlegroups.com
On Wed, 11 Aug 2021 16:02:24 +0000 Stefan Csomor wrote:

SC> Sorry about this, but I'm really not on friendly terms with Xcode, so I
SC> don't know how to do this. In principle, compiling PCRE should be trivial,
SC> it's just a bunch of C files and there are wx-specific headers in src/wx
SC> subdirectory that should be used when not using configure.
SC>
SC> ok, so these are not listed somewhere in a files list file ?

They're listed in build/bakefiles/regex.bkl, sorry for forgetting to
mention this.

SC> SC> Results in undefined macros PCRE2_MAJOR, PCRE2_MINOR
SC>
SC> Sorry, where does this happen?
SC>
SC> When I just set wxUSE_PCRE from 1 to 0 in regex.cpp

I'll remove this define and all the code not guarded by it, as the comment
in front of it says, I only left it temporarily in case we'd need to revert
this change entirely.

Regards,
VZ

Stefan Csomor

unread,
Aug 12, 2021, 9:15:24 AM8/12/21
to wx-...@googlegroups.com
Hi

SC> Sorry about this, but I'm really not on friendly terms with Xcode, so I
SC> don't know how to do this. In principle, compiling PCRE should be trivial,
SC> it's just a bunch of C files and there are wx-specific headers in src/wx
SC> subdirectory that should be used when not using configure.
SC>
SC> ok, so these are not listed somewhere in a files list file ?

They're listed in build/bakefiles/regex.bkl, sorry for forgetting to
mention this.

Since I don't have a Xcode old enough to still process the AppleScript I did so by hand. So both macOS and iOS Xcode projects now use PCRE2.

Best,

Stefan

Reply all
Reply to author
Forward
0 new messages