(*UTF8) option valid or not?

18 views
Skip to first unread message

David Wahlstedt

unread,
Aug 3, 2024, 10:00:56 AM8/3/24
to PCRE2 discussion list
Hi,

I found by testing with pcre2test v 10.39 and v 10.45 that the option (*UTF8) is allowed. I can't find any mentioning of it in the man pages.

How is the situation with those?
(*UTF8)
(*UTF16)
(*UTF32)

Does it depend on compile options (8,16, or 32 bit)?
I have only 8 bit in my test environment so far.

```
PCRE2 version 10.45-DEV 2024-06-09 (8-bit)
/(*UTF8)a/debug
------------------------------------------------------------------
  0   5 Bra
  3     a
  5   5 Ket
  8     End
------------------------------------------------------------------
Capture group count = 0
Compile options: <none>
Overall options: utf
First code unit = 'a'
Subject length lower bound = 1
a
 0: a
```

Best regards,

David

Philip Hazel

unread,
Aug 3, 2024, 11:34:24 AM8/3/24
to David Wahlstedt, PCRE2 discussion list
(*UTF8) was what was first used in PCRE1, long before there was 16-bit and 32-bit support. Subsequently (*UTF16) and (*UTF32) were added but then I realized that (*UTF) was better because it could be used in strings of all different code unit widths. In an 8-bit library (*UTF) was equivalent to (*UTF8); in a 16-bit library it was the same as (*UTF16) and in a 32-bit library (*UTF32).  I kept these synonyms when moving PCRE2 so that old patterns continued to work, but no longer documented them. They only work if used with the appropriate width library. Since PCRE2 has now been around almost 10 years, perhaps it is time to think about removing them, but they do no harm and perhaps there are still patterns that use them. 

Regards,
Philip


--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pcre2-dev/c95f75ad-eb62-49c0-9f0b-d0ce88ff977an%40googlegroups.com.

David Wahlstedt

unread,
Aug 3, 2024, 11:53:18 AM8/3/24
to PCRE2 discussion list
Thanks for the explanation! Then I will support them in my application, for compatibility.

David
Reply all
Reply to author
Forward
0 new messages