Hi,
On Sunday, 7 November 2021 13:08:38 CET Matthew Vernon wrote:
> On 24/12/2020 11:56, Diederik de Haas wrote:
> > On my system I have the 8/16/32 bit versions of the pcre2 library
> > installed.
> > The discription only tells me that this is the 8-bit runtime version.
> > But I have no idea why I/anyone would want a 8-bit runtime on my 64-bit
> > machine, where I'd normally expect (only) a 64-bit version, which
> > apparently, doesn't exist.
>
> The short answer is because you installed something that depends on the
> 8-bit runtime version.
I actually knew that, but I should've phrased my request a bit clearer.
$ aptitude why libpcre2-8-0
i git Depends libpcre2-8-0 (>= 10.34)
Let's take the Linux kernel as an example, which ofc uses git and has
contributors from all over the world, include those of whose name won't always
fit in ANSI/UTF-8 chars. Let's take Japanese as an example.
So if I want to query git's log (assuming it uses RE for that) for commits by
a Japanese person, it won't be able to find it because it's using the 8-bit
variant of libpcre2?
> The slightly longer answer is that the X-bit naming refers to the size
> of code points - so the 8-bit version takes strings composed of chars,
> representing single-byte characters, or UTF-8 strings. The 16 and 32
> libraries instead take strings contained in arrays of 16 or 32-bit code
> units (which again might be single-unit characters or UTF-16 or UTF-32
> strings.
My suspicion is that the choice for either the 8, 16 or 32 bit version isn't
made as consciously as possibly should. Until your reply I wouldn't have known
which one to pick if I wanted to package a program and would likely just copy
what someone else has done (which may have followed the same 'procedure').
But if this information is added to the long description (+possible trade-
offs*), then people can make a better informed and thereby a (more) deliberate
choice.
*) I don't know, but I can imagine that the 8bit version is faster then the
16bit one. The downside is that you'll exclude RE using UTF-16/32 chars, like
Japanese above. Depending on the use case, that can be acceptable. Or not.
Cheers,
Diederik