On 08.04.2020 21:17, Jorgen Grahn wrote:
> On Wed, 2020-04-08, Alf P. Steinbach wrote:
>> On 08.04.2020 16:57, RM wrote:
>>> Are there is C++'s regexes named capchering groups and can I use them
>>> with sub_match object? I have a problem with translating from PHP to C++
>>> some code that uses PHP's preg_match_all function.
>>
>> Just note that the still young regex sub-library is scheduled for
>> deprecation, if it isn't deprecated already in C++17, because it just
>> doesn't know how to deal with UTF-8.
>
> Any reference for this?
cppreference.com doesn't mention this; instead
> the regex library seems to have grown somewhat in C++17.
Not sure exactly where I picked that up but Mr. Google now found a
February 15. 2020 comment by Tom Honermann,
“In Prague, during the SG16 discussion of P1844R1 - Enhancement of
regex, we had consensus to move forward with a proposal to deprecate
std::regex due to performance and ABI concerns.”
https://github.com/sg16-unicode/sg16/issues/57
There's no more than that and a link to the enhancement paper, but one
possible problem may be that e.g. regex-searching for three consecutive
non-space characters can find a single UTF-8 char, if the regex engine
is specified as or implemented as simple byte sequence searching.
Since non-ASCII UTF-8 chars consist entirely of bytes >= 128 and start
with special pattern I don't think erroneously finding things /within/
UTF-8 characters is a problem. But until such time as C++ support for
all kinds of character encodings is removed there could be that problem
with other encodings, e.g. in particular my association circuit now pops
up Shift-JIS. So maybe they have considered also that, but it would be
nice with some feature where you can say, “Hey, support only UTF-8!”. 😃
- Alf